Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not all indexed objects are checked against the database in case index/database are out of sync #5379

Open
thomaslow opened this issue Oct 7, 2022 · 1 comment
Assignees
Labels

Comments

@thomaslow
Copy link
Collaborator

thomaslow commented Oct 7, 2022

There is a code segment that checks whether already indexed objects are still available in the database, and if not, are removed from the index. This may happen only in rare scenarios, e.g., when manually changing the database, or when restoring database/index backups that were not created at the exact same time.

Unfortunately, the implementation only works for a limited amount of objects, because it uses a search query that can only ever retrieve max_result_window (ElasticSearch parameter, usually 10.000) many objects. If the index contains more objects, this check will fail.

Instead, the code should use some method that actually allows to iterate over all indexed objects (independent of some limit).

@matthias-ronge
Copy link
Collaborator

This situation happens when there is an exception during process deletion, and it leaves an index with too many objects, and the only work-around is to delete and rebuild the whole index. It doesn’t happen often, but if it happens, for an instance with thousands of processes, it can mean a day of downtime, which is very bad news for a digitization centre. Therefore I add the blocking bug because this is something we should fix really, really soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

2 participants