-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
psm argument doesn't work with Tesseract 4.0 #110
Comments
I found config argument: "tessedit_create_hocr": "1" in order to return data in hocr format. |
Do the docs need to be improved? |
Yes, would be great to improve docs available output formats. |
Hello @tleyden Do you have any updates? Thanks. |
Hello! We're having the exact same problem. We would like to launch tesseract with the psm:3 parameter but we fail to do so for tesseract 4.0. |
The problem seems to be in this line Line 87 in 1cd43c1
we managed to fix it only for tesseract 4.0 by changing it in result = append(result, "--psm") probably it's needed to switch between the 2 cases to make the change backward compatible. |
Could you please say where exactly I need to replace it? I went into the docker container of the worker and httpd and located the mentioned file, changed it and restarted both containers. Error is still the same. |
There is a workaround solution for this issue.. Use command below to get into Refactor the source code Recompiling execution file If you encountered the message below: Restart docker container |
Hello,
I'm trying to launch your environment with tleyden5iwx/open-ocr-2 image.
This image should contain Tesseract 4.0. Looks like decoding image/pdf using psm argument doesn't work.
Request Body:
{
"img_url": "http://bit.ly/ocrimage",
"engine": "tesseract",
"engine_args": {
"config_vars": {
"tessedit_char_whitelist": "0123456789"
},
"psm": "3"
}
}
Reponse:
Error processing image url: . Error: exit status 1
In Tesseract 3.* psm argument use one "-psm", in Tesseract 4.0 two "--psm". I think this is the main issue.
By the way, can you create one addition argument where I can control the output. Not only raw text. I want to receive text in *.hocr format too. And any other. I would be very appreciate to have this feature!
Thanks!
The text was updated successfully, but these errors were encountered: