-
-
Notifications
You must be signed in to change notification settings - Fork 965
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation search results relevance improvements #1097
Comments
In English, the word "through" is a stopword and is ignored in the search against the English dictionary used in PostgreSQL. From the PostgreSQL documentation:
The English word "through" is not a stopword in another dictionary for example the Italian dictionary, and in fact the search in this language shows results: |
I figured it might be something like that. Framework function names and stuff should bypass that logic somehow. |
@boxed I don't think "through" is the only stopwords that matter in search. |
Hm.. I don't know about a complete list. But certainly "where" is suspicious as it's a keyword in SQL. This becomes a bit tricky as "where" should probably just be searched when it's in a code block like That's what I could find reading through this list. I think one could image a solution where the search is run and there are no hits, then it's re-run but ignoring stopwords. This would fix the worst case at least. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
We can try to create a custom English dictionary without relevant words for Django. |
Noting the issue with stopwords – and also from #1496, we got the following recommendation:
I’ve re-titled the issue accordingly so we consider more improvements than just stopwords refinements. Related: Site-wide search #1499. |
Considering this simple and limited scope change has seen no improvement in several years, I don't think broadening the scope of the issue is a good idea. Talking about this issue not moving forward... Could we maybe consider building something simple in front of the current code that does a very simple string matching on just the titles in the documentation and showing that first? Maybe other hard coded searches could be added too, since for example searching "group by" shows nothing of relevance. |
If anyone really wants to fix the issue with stopwords only – that’s still as welcome as it was until now. This is a volunteer-run project, and this hasn’t been picked up in three years of it being defined as quite a narrow improvement. I think putting this in the broader context of search improvements will make it clearer to potential contributors what the goal here is. Personally what I’d like to see is a more strategic approach to this where we look at analytics on what searches are being made that have 0 results. I don’t like the idea of hard-coded searches as we simply don’t have the capacity to maintain that kind of content. I’d rather we set up boosting based on headings (if that’s not already the case). |
I agree on the statistics being very useful. |
We’ve decided the next steps are:
|
@thibaudcolas What is the status (if any) on getting data for docs searches over some period of time? Is the ops team tracking this need? I think we asked someone from ops about it during Sprints at DjangoCon US, but my memory isn't always great, and I'm not sure if there is a formal process for making a request like this or if the working group simply asking ops is sufficient. |
@jacklinke yup, see #1628. |
To help us consider this and similar search improvements, I’ve requested help from Algolia to get the Django docs indexed in their Algolia DocSearch program. They provide free access to their Algolia Search product, for projects looking for developer documentation search. Here’s where you can trial how it works: Trial: Algolia DocSearch. This page is only intended to try out a different search implementation so we can improve ours, like we also have the Sphinx search setup available on django.readthedocs.io. I’ve only set it up to index Django 5.1 in English at this time. For Screenshot of the Beyond
If anyone would like access to the behind-the-scenes search admin please let me know. I use DocSearch for other projects so can give you a tour. |
I don't think the best solution is to use an external engine here. We have spent effort and time to use Django itself for the search in the documentation and remove a lot of issues from the elastic search synchronization. The search function just needs a little tweaking. There have been many complaints over the years but unfortunately little help. I'm glad to see some interest in this area. I still think that the Django website is also a showcase to demonstrate its potential as a web framework, and using an external search engine would be like admitting that Django's full-text search is not good enough to be used in a web portal. I would use the necessary forces to integrate an external engine to improve the search we already have and also the documentation. |
Algolia looks very nice and fast, and it's great to see that it handles the sectioned docs (which google, readthedocs, and the current system all fail on). |
This is a bit of a sunk cost fallacy. This issue has existed for many years and it's still not solved, and it's not exactly a minor issue.
Django is a comprehensive framework. Django provides the tools to build a full-text search solution, but in case like the documentation, which is a pretty complicated one, we don't seem to have the resources to do it. Which doesn't mean it cannot be done in Django, but there are specific issues that need a custom implementation for each use case:
Can all of these things be solved with Django's tools? Yes! Doesn't mean that with the limited amount of resources available, we should dedicate them to build and maintain a fully comprehensive full-text search that is as good or close to it as Algolia or another third party option. Basically, maybe we should actually consider Algolia or a third party solution rather than building our own. |
I shared my trial so we could compare another implementation, I wouldn’t recommend anyone considers another search engine at this point in time. Once the proposed website working group is up and running, we can ask them whether they’d consider such a big shift, and if so review multiple options, and if not make a plan to fix those long-standing search-related UX issues. @pauloxnet with the current engine – what do you think of implementing type-ahead search, and changing the index so each entry is a section of a page, rather than the whole page? |
Searching for "through" finds nothing:
This search should link at least these:
https://docs.djangoproject.com/en/3.2/topics/db/models/#extra-fields-on-many-to-many-relationships
https://docs.djangoproject.com/en/3.2/ref/models/fields/#django.db.models.ManyToManyField.through
The text was updated successfully, but these errors were encountered: