-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[gitindex] Indexing repositories with malformed documents / missing blobs #73
Comments
For invalid UTF-8 sequences, we should just insert a placeholder and continue. I quickly looked at the code, and I think it's already doing that. Can you verify if it really aborted for invalid UTF-8 ? |
This is the error I get, I only suspect it to be an encoding / UTF-8 error. Honestly I did not search for the root cause.
|
aha. Could you share the file with me? Or maybe make a smaller reproducer? You probably need to cut off runes from the start in multiples of 100. |
File is already attached in the previous comment. I've zipped it because otherwise github does not let me upload it. |
I've refactored the patch and uploaded it to gerrit ( a new commit with a new changeset id ). Honestly I'm a little bit confused with the gerrit workflow, never worked with it before. Please tell me if I have to change something. Thanks. |
can't repro. Which version were you using?
|
When running zoekt-git-index on all of our GIT repositories, I've noticed that few repositories are missing from the search. After digging into it I discovered that the indexer aborts at the first indexing error. Since it may happen from time to time that a repository contains a malformed document (e.g with invalid UTF-8 sequences ...) the indexer should be able to ignore these erros.
I've added a flag ContinueOnError to allow indexing of repositories with missing blobs and malformed documents. These repositories should be fixed anyway - but in the meantime only the broken files are not indexed but not the whole repository.
The text was updated successfully, but these errors were encountered: