How can I use the models for Fraktur (German) ? #46

Hermann12 · 2021-04-11T10:51:21Z

I would like to use your model for Fraktur. How must this implemented or is this only a special command?

stweil · 2021-04-11T11:17:55Z

Download the desired model file(s) (*.traineddata), either fast (recommended for recognition) or best (required for additional training) variant
Install the model file(s) in your local tessdata directory or a subdirectory of that directory
Optionally rename the model file(s)
Run Tesseract and specify the name of the model file (-l MODEL), maybe with the subdirectory before the name and without the trailing .traineddata

Models are available from these URLs:

https://ub-backup.bib.uni-mannheim.de/~stweil/tesstrain/AustrianNewspapers/ (trained from newspapers)
https://ub-backup.bib.uni-mannheim.de/~stweil/tesstrain/Fraktur_5000000/ (trained based on script/Fraktur)
https://ub-backup.bib.uni-mannheim.de/~stweil/tesstrain/GT4HistOCR/ (trained from scratch)
https://ub-backup.bib.uni-mannheim.de/~stweil/tesstrain/frak2021/ (latest models)

We used https://ub-backup.bib.uni-mannheim.de/~stweil/tesstrain/frak2021/tessdata_fast/frak2021_1.069_755545_3685930.traineddata (CER 1.069 % on selected ground truth) for our latest own OCR, but depending on your texts other models might give better results.

Hermann12 · 2021-04-11T11:46:03Z

Realy good. Thank you very much! I see only one Problem with my Test file with the "ſ" => "s" . But anyway realy good in comparison of my previous Tests.
Example:
Vorrede.
Belehrt durch die Erfahrung, wie leicht der Zuhörer Urtheil
über die Geiſteserzeugniſſe ihres Predigers durch ſo manche

stweil · 2021-04-11T15:46:02Z

We train our models to detect the long s as "ſ", so if you want an "s", that requires a simple search and replace operation on the results.

Hermann12 · 2021-04-11T20:11:21Z

o.k. thanks, understood. As I told before, I am very happy with this result! I detect another issue "oͤ" instead of "ö", but not always. Maybe my bad scan could the reason. I have very rough paper. Does you prefere .jpg or .png as the source?
I will figure out for my project, if it's good enough to improve my pictures, or I have to improve the traineddata. The second is maybe the more difficult thing.

stweil · 2021-04-11T20:35:57Z

The model was trained on a wide range of historic texts (from early prints to early 20th century) which include both umlaut variants "oͤ" and "ö". Tesseract does not care which image format you provide: it works with jpg, png and other image formats.

Hermann12 · 2021-04-11T20:54:06Z

My print is from 1828. I see both variants on the same page, even it's only a unique sign "ö" on paper.

stweil · 2021-05-19T06:02:10Z

Can you provide example images?

Hermann12 · 2021-05-19T21:21:56Z

Source:

Result: see row 24, same line different character.
beſtehenden allerhoͤchſten Vorſchriften kräftig zu fördern: um

stweil · 2021-05-20T03:57:34Z

Line 24 contains indeed both variants of "ö", so the OCR result is correct when it makes a difference. "allerhöchsten" uses lower case "o" combined with a small "e". That's what the OCR should detect.

Hermann12 · 2021-05-20T15:57:35Z

Line 24:
case 1: "allerhöchsten" => "o" & "e" - AND same line
case 2: "fördern" => "ö"
Why ???

stweil added the question label May 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I use the models for Fraktur (German) ? #46

How can I use the models for Fraktur (German) ? #46

Hermann12 commented Apr 11, 2021

stweil commented Apr 11, 2021

Hermann12 commented Apr 11, 2021

stweil commented Apr 11, 2021

Hermann12 commented Apr 11, 2021 •

edited

Loading

stweil commented Apr 11, 2021 •

edited

Loading

Hermann12 commented Apr 11, 2021

stweil commented May 19, 2021

Hermann12 commented May 19, 2021 •

edited

Loading

stweil commented May 20, 2021 •

edited

Loading

Hermann12 commented May 20, 2021 •

edited

Loading

How can I use the models for Fraktur (German) ? #46

How can I use the models for Fraktur (German) ? #46

Comments

Hermann12 commented Apr 11, 2021

stweil commented Apr 11, 2021

Hermann12 commented Apr 11, 2021

stweil commented Apr 11, 2021

Hermann12 commented Apr 11, 2021 • edited Loading

stweil commented Apr 11, 2021 • edited Loading

Hermann12 commented Apr 11, 2021

stweil commented May 19, 2021

Hermann12 commented May 19, 2021 • edited Loading

stweil commented May 20, 2021 • edited Loading

Hermann12 commented May 20, 2021 • edited Loading

Hermann12 commented Apr 11, 2021 •

edited

Loading

stweil commented Apr 11, 2021 •

edited

Loading

Hermann12 commented May 19, 2021 •

edited

Loading

stweil commented May 20, 2021 •

edited

Loading

Hermann12 commented May 20, 2021 •

edited

Loading