Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically refine word-level alignments from sentence-level alignments #106

Open
ryanfb opened this issue Jun 20, 2022 · 2 comments
Open

Comments

@ryanfb
Copy link

ryanfb commented Jun 20, 2022

First of all, thanks so much for all your work on this and making it open source! It would be cool if it were possible to do a fragment search using an existing SRT transcription without having to re-transcribe all of the audio in advance. One way to do this would be to use the existing sentence-level alignments to extract the audio ranges for sentences that match a search, then use vosk to transcribe just those audio ranges, then use the results of those transcriptions to extract the fragment-level audio.

@antiboredom
Copy link
Owner

That's an interesting idea - I'd definitely be open to experimenting with it... Alignment might also work here. alphacep/vosk-api#756

@cmprmsd
Copy link

cmprmsd commented Dec 22, 2023

It would also be beneficial to rely on the words sourced from the subtitle file. That way the detection quality could be improved a lot, right? I tried to implement @ryanfb's suggestion back in 2021 with pocketsphinx but the results weren't promising. 😢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants