-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Import silero instead of downloading it #214
Comments
torch is needed by openai-whisper (the backend of whisper-timestamped) Also note that silero is not in the requirements of whisper-timestamped
This ugly piece of code is a workaround to be able to reach old version of silero (because there are some issues in the early packagings of silero). The current packaging of silero also allows to have several silero models (versions) on the same system. I understand that for some use cases it might be useful to use:
An option "vad=silero_from_pip" (or similar name) could be implemented to switch to the silero that is installed with pip/python. |
|
@Jeronymous do you think this makes any sense? I'm not too familiar with each, but what I understood about silero-vad, there could be a chance for improved accuracy. |
Silero can be imported, which means no torch etc needed: https://github.com/snakers4/silero-vad?tab=readme-ov-file#fast-start
This would make it easier to package the whisper-timestamped to docker as it would avoid the hassle of predownloading Silero to a specific folder. I see that audiotok is already imported this way.
I believe big chunk of this could be gotten rid of along the way: https://github.com/linto-ai/whisper-timestamped/blob/master/whisper_timestamped/transcribe.py#L1952-L1981 and done in this way instead: https://github.com/linto-ai/whisper-timestamped/blob/master/whisper_timestamped/transcribe.py#L2007
The user of this library could then pin the silero version in their requirements.txt
As a sidenote, I also think Silero could perhaps be used to further enhance the timestamp accuracy 🤔 Based on my quick testing WhisperX has still a slight edge over Whisper-timestamped. It gives both more precise (three digits vs two digits) and more accurate results.
The text was updated successfully, but these errors were encountered: