Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import silero instead of downloading it #214

Open
villesau opened this issue Oct 4, 2024 · 3 comments
Open

Import silero instead of downloading it #214

villesau opened this issue Oct 4, 2024 · 3 comments

Comments

@villesau
Copy link
Contributor

villesau commented Oct 4, 2024

Silero can be imported, which means no torch etc needed: https://github.com/snakers4/silero-vad?tab=readme-ov-file#fast-start

This would make it easier to package the whisper-timestamped to docker as it would avoid the hassle of predownloading Silero to a specific folder. I see that audiotok is already imported this way.

I believe big chunk of this could be gotten rid of along the way: https://github.com/linto-ai/whisper-timestamped/blob/master/whisper_timestamped/transcribe.py#L1952-L1981 and done in this way instead: https://github.com/linto-ai/whisper-timestamped/blob/master/whisper_timestamped/transcribe.py#L2007

The user of this library could then pin the silero version in their requirements.txt

As a sidenote, I also think Silero could perhaps be used to further enhance the timestamp accuracy 🤔 Based on my quick testing WhisperX has still a slight edge over Whisper-timestamped. It gives both more precise (three digits vs two digits) and more accurate results.

@Jeronymous
Copy link
Member

Jeronymous commented Oct 4, 2024

Silero can be imported, which means no torch etc needed

torch is needed by openai-whisper (the backend of whisper-timestamped)
https://github.com/openai/whisper/blob/main/requirements.txt#L3
so to this regard, how silero is imported will change nothing
(and probably silero itself uses torch...)

Also note that silero is not in the requirements of whisper-timestamped
https://github.com/linto-ai/whisper-timestamped/blob/master/requirements.txt
It is only needed if silero is used.

I believe big chunk of this could be gotten rid of along the way

This ugly piece of code is a workaround to be able to reach old version of silero (because there are some issues in the early packagings of silero).
The thing is that we saw a performance degradation (for our use cases) with the last versions of silero, so we decided to continue maintaining old version of silero (making them accessible), despite some ugly code to make that work.
Also for the sake of reproducibility for people experimenting with silero + whisper.

The current packaging of silero also allows to have several silero models (versions) on the same system.
Which make possible to call whisper-timestamped with different silero settings with a unique integration.

I understand that for some use cases it might be useful to use:

from silero_vad import load_silero_vad, read_audio, get_speech_timestamps

An option "vad=silero_from_pip" (or similar name) could be implemented to switch to the silero that is installed with pip/python.
Maybe you can open a fork and a PR with that suggestion?

@villesau
Copy link
Contributor Author

villesau commented Oct 4, 2024

silero_from_pip could make sense! This is what I ended up doing when deploying to Replicate: villesau/whisper-timestamped-replicate@367dd57
Needs to be run before building.

@villesau
Copy link
Contributor Author

villesau commented Oct 4, 2024

As a sidenote, I also think Silero could perhaps be used to further enhance the timestamp accuracy 🤔 Based on my quick testing WhisperX has still a slight edge over Whisper-timestamped. It gives both more precise (three digits vs two digits) and more accurate results.

@Jeronymous do you think this makes any sense? I'm not too familiar with each, but what I understood about silero-vad, there could be a chance for improved accuracy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants