-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multi-speaker multi-track audio #131
Comments
what is in your stereo? |
hey thanks, actually i am trying to stream conversation between two people(agent and a customer), Can you suggest me some guides where I can study more on this, any suggestion would be great. |
if you have the voices in separate tracks, it's good, you don't need diarization (good topic to know about). Then you probably need voice activity controller that sends the speaking part track into WhisperStreaming. You can modify this class: whisper_streaming/whisper_online.py Line 518 in 0a00254
You can use multiple Silero VAD Iterator objects: https://github.com/ufal/whisper_streaming/blob/main/silero_vad.py , one for each track, to control sending the voice to the OnlineASRProcessor . When it returns the output, wrap it with info who spoke that. In any way, you should make sure that the context of the previous turns is not cleared. Use finalize() but do not clear the HypothesisBuffer. |
Thank you for you input, will work on it |
Hi,
I am trying to transcribe live stereo audio to mono audio and transcribe them, is there any recommended methods to implement this, I have tried converting stereo to mono and my result is very inaccurate.
Thanks in advance for the help
The text was updated successfully, but these errors were encountered: