-
Notifications
You must be signed in to change notification settings - Fork 444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How can I use the models for Fraktur (German) ? #46
Comments
Models are available from these URLs:
We used https://ub-backup.bib.uni-mannheim.de/~stweil/tesstrain/frak2021/tessdata_fast/frak2021_1.069_755545_3685930.traineddata (CER 1.069 % on selected ground truth) for our latest own OCR, but depending on your texts other models might give better results. |
Realy good. Thank you very much! I see only one Problem with my Test file with the "ſ" => "s" . But anyway realy good in comparison of my previous Tests. |
We train our models to detect the long s as "ſ", so if you want an "s", that requires a simple search and replace operation on the results. |
o.k. thanks, understood. As I told before, I am very happy with this result! I detect another issue "oͤ" instead of "ö", but not always. Maybe my bad scan could the reason. I have very rough paper. Does you prefere .jpg or .png as the source? |
The model was trained on a wide range of historic texts (from early prints to early 20th century) which include both umlaut variants "oͤ" and "ö". Tesseract does not care which image format you provide: it works with jpg, png and other image formats. |
My print is from 1828. I see both variants on the same page, even it's only a unique sign "ö" on paper. |
Can you provide example images? |
|
Line 24: |
I would like to use your model for Fraktur. How must this implemented or is this only a special command?
The text was updated successfully, but these errors were encountered: