Replies: 1 comment
-
Hi @wdchild, I think you're looking for the |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I've spent some time exploring
pdfplumber
, including taking a look at thepage.chars
andpage.rects
objects and thepage.extract_text
method. Chars come with their positions well documented in the returned data. But is there a sensible way to identify the positions / boundary boxes of individual words in the text? For scanned images, engines like tesseract allow you to bound individual words (or what it perceives as words). I'm seeing no easy way to do that with plumber. What am I missing? Thanks!Beta Was this translation helpful? Give feedback.
All reactions