-
-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple videos fails to be encoded in khan_academy_fr #101
Comments
This occurs at least when same video file ( The code takes into account that a given content_file Command used:
ffmpeg error is not displayed in the log due to poor usage of the deduplicating videos to process is not as straightforward due to the @rgaudin do you remember if you already saw a given |
Sorry @benoit74 but despite all this information, it's not clear. The ticket mentions an issue with ffmpeg process.
You seem to have encountered something wrong or hardly understandable around IDs while looking at this but it seems unrelated to the current issue ?? Can you maybe open a separate ticket? |
Exact ffmpeg error is (I just fixed this part, we know have clear logs):
This is due to the fact that we remove the original file once it is encoded, and we try to encode 3 times the same file because 3 nodes are using the same file. |
Now that's clearer. Is there any remaining question? |
Since there is indeed a remaining question, I assume it is still not crystal clear 🤣 Let me take one example from Khan Academy FR. New glossary (previous one was confusing):
Inside, database, we encounter the kind of situation below.
In order words, the same This causes the above-mentioned FFMEG error because:
Digging a bit deeper, the scraper logic feels a bit weird, because:
My proposition is then to change the scraper logic:
Consequence is that we will reencode all videos again, but it is probably the right moment because we have only processed Khan Academy FR for now. And we could even imagine to move manually (with a script of course) all files from WDYT about my proposition? (this is the remaining question 😉) |
It makes sense 👍 . I don't recall exactly but I remember we were working mostly off a couple channels so we had to guess the intent based on limited samples. I think we wanted to store stuff on S3 with the ID that's on the studio so that a video used in multiple channels (not multiple nodes) would be stored once and would be detected when checking the bucket. The downloading/encoding looks like this now that you've changed the way concurrency works. Because download is I/O bound and encode is CPU-bound, I believe we had the two loops being consumed together. |
1746 videos have failed to be re-encoded in https://farm.openzim.org/pipeline/62191f74-ff73-473d-acc3-49af55fb5f8b/debug (but 2869 have succeeded, so the ratio is not totally bad ^^)
We unfortunately do not have the detailed stdout/stderr of ffmpeg in the log.
Sample error:
The text was updated successfully, but these errors were encountered: