Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run tesseract with OMP_THREAD_LIMIT=1 #203

Open
3 tasks
kirkkwang opened this issue Mar 28, 2023 · 1 comment
Open
3 tasks

Run tesseract with OMP_THREAD_LIMIT=1 #203

kirkkwang opened this issue Mar 28, 2023 · 1 comment
Assignees

Comments

@kirkkwang
Copy link
Contributor

kirkkwang commented Mar 28, 2023

Story

Indiana University has a set up in their code where they run tesseract with OMP_THREAD_LIMIT=1. This would be nice to bring over to IIIF Print.

Acceptance Criteria

  • Tesseract runs with OMP_THREAD_LIMIT=1

Testing Instructions and Sample Files

GOAL: Test if tesseract runs faster in a deployed environment. It should take about 10-12 mins per page instead of over 30+ mins.

sample pdf: service-rbc-rbc0001-2015-2015gen56010-2015gen56010 (1).pdf

  • ingest the sample PDF into UTK (I believe UTK is the last project that doesn't have OMP_THREAD_LIMIT=1
  • observe that each page doesn't take more than 10-12 mins to run.

Notes

Close this ticket after verifying it works.

@ShanaLMoore
Copy link
Contributor

ShanaLMoore commented Apr 3, 2023

TODO: We need to upgrade iiif_print version of the various applications and deploy them to staging, in order to test this work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants