Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate better approaches for the domain filter and url_search_string filter #832

Open
pgulley opened this issue Oct 22, 2024 · 1 comment
Assignees

Comments

@pgulley
Copy link
Member

pgulley commented Oct 22, 2024

At the very least, we discovered that the interpretation of "field_name:some_string" in the query string defaults to a "contains" not "is equal to" expression, so we are potentially overmatching on canonical domain (depending on how elasticsearch's tokenizer interprets these string)- and the wildcard might be totally redundant. Or it might not be. An afternoon spent poking at it in kibana would quickly reveal the truth.
Either way, the escaping we're doing now is totally redundant and might be impacting search results as well.

Related:
Should constructing the filter search strings even happen in the web_search? That feels like something we should be handling in the news-search-api. At the very least, it feels like a better developer pattern would be to expect to make updates to the NSA when we wanted to change the syntax of our queries.

@philbudne
Copy link
Contributor

I wonder if it all belongs in mc-providers, and news-search-api should go away?

Also:
Do today's revelations give us the possibility that url_search_strings CAN be done on IA??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants