How does "extract words" work? #412
mangu75
started this conversation in
Ask for help with specific PDFs
Replies: 1 comment
-
Hello, and apologies for the late response — I lost track of this inquiry. It's a bit difficult to say with certainty without having access to the PDF, but I believe the reason you are seeing words with multiple spaces is that you have set By default, |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
i´m using extract word in order to get all the words from a pdf but i´m not sure how exactly it works.
This is the way i´m using it:
pdfplumber.utils.extract_words(pdf_content.chars, x_tolerance=1, y_tolerance=1, keep_blank_chars=True)
I understand that it saves in a list every group of words separated just by one space, however sometimes it appends words that have more than one space away such as this one:
what is the explanation?
Thank you!!
Beta Was this translation helpful? Give feedback.
All reactions