How to extract table of contents in PDF #481
-
PyPDF2 has getOutlines() function to extract the table of contents in PDF. Is there any similar function in pdfplumber? I read the README and search in issues and google but didn't find anything related. |
Beta Was this translation helpful? Give feedback.
Answered by
samkit-jain
Jul 23, 2021
Replies: 1 comment
-
Hi @Ynjxsjmh >>> import pdfplumber
>>> pdf = pdfplumber.open("file.pdf")
>>> pdf.doc.get_outlines()
|
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
Ynjxsjmh
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi @Ynjxsjmh
pdfplumber
is built onpdfminer.six
and it also provides aget_outlines(...)
method. It might be different from the one provided byPyPDF2
. To access it, you can use the following codepdf.doc
is an instance of PDFDocument. An example on how to use the method can be found here.