How to extract table of contents in PDF #481

Ynjxsjmh · 2021-07-23T00:11:32Z

Ynjxsjmh
Jul 23, 2021

PyPDF2 has getOutlines() function to extract the table of contents in PDF. Is there any similar function in pdfplumber? I read the README and search in issues and google but didn't find anything related.

Answered by samkit-jain

Jul 23, 2021

Hi @Ynjxsjmh pdfplumber is built on pdfminer.six and it also provides a get_outlines(...) method. It might be different from the one provided by PyPDF2. To access it, you can use the following code

>>> import pdfplumber
>>> pdf = pdfplumber.open("file.pdf")
>>> pdf.doc.get_outlines()

pdf.doc is an instance of PDFDocument. An example on how to use the method can be found here.

View full answer

samkit-jain · 2021-07-23T13:59:05Z

samkit-jain
Jul 23, 2021
Collaborator

Hi @Ynjxsjmh pdfplumber is built on pdfminer.six and it also provides a get_outlines(...) method. It might be different from the one provided by PyPDF2. To access it, you can use the following code

>>> import pdfplumber
>>> pdf = pdfplumber.open("file.pdf")
>>> pdf.doc.get_outlines()

pdf.doc is an instance of PDFDocument. An example on how to use the method can be found here.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to extract table of contents in PDF #481

{{title}}

Replies: 1 comment

{{title}}

Select a reply

How to extract table of contents in PDF #481

Ynjxsjmh Jul 23, 2021

Replies: 1 comment

samkit-jain Jul 23, 2021 Collaborator

Ynjxsjmh
Jul 23, 2021

samkit-jain
Jul 23, 2021
Collaborator