Not reading the pdf file #804
drnko
started this conversation in
Ask for help with specific PDFs
Replies: 1 comment 1 reply
-
Hi @drnko, and thanks for your interest in this library. It appears you're trying to do the following:
The PDF you create in step 1 is an "image-based PDF" (see here for context), and contains no information about the actual text it represents — it's just a picture. You can try using OCR (optical character recognition) software to add a text layer back to the PDF, but |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Whenever I'm converting an image to PDF and trying to extract the text from the converted PDF, the result from PDFplumber is blank.
What I'm doing wrong?
Step 1:
Converting an image(jpeg/jpg/png) to PDF using the PIL
Saving the converted pdf file.
Step 2:
Open converted/saved pdf using pdfplumber.open()
Extracting text from the loaded/opened pdf file
===============================================================
Below is the code:
image_1 = Image.open(r'D:\ocr\images\barrel.jpg')
im_1 = image_1.convert('RGB')
im_1.save(r'test.pdf')
inv_pdf = pdfplumber.open('test.pdf')
print('Result:' , inv_pdf.pages[0].extract_text())
===============================================================
Terminal:
PS D:\ocr> & "C:/Program Files/Python310/python.exe" d:/ocr/testing.py
Result:
PS D:\GitOCR\ocr>
===============================================================
Below are the files converted PDF files from image file:
test.pdf
test1.pdf
test2.pdf
version: pdfplumber 0.7.6
Beta Was this translation helpful? Give feedback.
All reactions