Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First-pass quality control step for low-confidence / non-dictionary words #351

Open
moeffju opened this issue Sep 17, 2023 · 0 comments
Open
Labels
enhancement New feature or request frontend

Comments

@moeffju
Copy link
Contributor

moeffju commented Sep 17, 2023

Whisper's confidence scores are pretty good and usually correlate well with the quality of the output, so it would be great to be able to quickly correct all low-confidence words at once before going into finer quality control. Similarly, non-dictionary words could be flagged for manual control as well.

To stick with my example from cccamp23, I would imagine it working a bit like this:

  1. I upload the audio and wait for the worker to finish
  2. I click a button to start quick correction mode
  3. The transcribee editor walks me through all tokens, one by one, that have a low confidence score (threshold for dark red text), in context, and lets me edit the instance, edit all instances of the same token, or mark the token as correct (which would raise the confidence for the same token in the following text)

In the aforementioned talk, I could quickly correct different spellings of "Bonify" or "Schufa" to a single spelling and mark all instances as "high confidence" that way.

On top of using whisper's confidence scores, it could be useful to also run the transcribed text through a dictionary checker, because Whisper will sometimes transcribe things like "Geburtstartum" oder "Kreditauskunftsteil" with high confidence even though those aren't dictionary words.

@phlmn phlmn added enhancement New feature or request frontend labels Sep 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request frontend
Projects
None yet
Development

No branches or pull requests

2 participants