Would like to grab data from this pdf but not a defined table and text sometimes leaks over to other columns, how do I get all the data? #1164
MatinQurban
started this conversation in
Ask for help with specific PDFs
Replies: 1 comment 1 reply
-
This seems like a type of PDF where you might have more luck first identifying the bounding boxes of the page's core sections (using either the bold text in Alternatively, you may have more luck using |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
table_settings = { "vertical_strategy": "text", "horizontal_strategy": "lines", "snap_tolerance": 3, } im.reset().debug_tablefinder(table_settings)
Currently I am going line by line and using each elements relation to everything else on the line to 'guess' where some of the data goes, however sometimes it will not work as the data is very hard to distinguish. For example labor hours and paint hours are very similar values and I cannot simply search for decimal values <10.0 because it would apply to both of them.
Is there a way to get only the labor and paint hours since those two columns seem to be grouped together properly?
Beta Was this translation helpful? Give feedback.
All reactions