Accuracy of identifying pdf table #629

hhhhjjj · 2022-03-23T02:51:57Z

hhhhjjj
Mar 23, 2022

Do you have the accuracy of identifying pdf table？

Mar 23, 2022

Hi @hhhhjjj, and thanks for your interest in this library. I take your question to mean something like: Has pdfplumber's table-detection algorithm been tested against a benchmark, and evaluated re. whether the tables are extracted correctly? (If that's not your question, please do let me know.)

The short answer is: no, it has not. The longer answer is that pdfplumber's table-detection algorithm does not take a probabilistic approach, but rather a deterministic one. And although it aims to provide utility with just its default settings, it provides the most utility when you customize the detection settings to the particular PDF you are parsing. So although it's theoretically possible to test

View full answer

jsvine · 2022-03-23T13:33:29Z

jsvine
Mar 23, 2022
Maintainer

Hi @hhhhjjj, and thanks for your interest in this library. I take your question to mean something like: Has pdfplumber's table-detection algorithm been tested against a benchmark, and evaluated re. whether the tables are extracted correctly? (If that's not your question, please do let me know.)

The short answer is: no, it has not. The longer answer is that pdfplumber's table-detection algorithm does not take a probabilistic approach, but rather a deterministic one. And although it aims to provide utility with just its default settings, it provides the most utility when you customize the detection settings to the particular PDF you are parsing. So although it's theoretically possible to test pdfplumber's default table-detection approach against an external benchmark, and it might provide some helpful comparisons for people who want to bulk-and-auto-extract tables from a large number of different PDFs at once, I don't think would represent pdfplumber's typical usage.

1 reply

hhhhjjj Mar 24, 2022
Author

Thank you for your answer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accuracy of identifying pdf table #629

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Accuracy of identifying pdf table #629

hhhhjjj Mar 23, 2022

Replies: 1 comment · 1 reply

jsvine Mar 23, 2022 Maintainer

hhhhjjj Mar 24, 2022 Author

hhhhjjj
Mar 23, 2022

Replies: 1 comment 1 reply

jsvine
Mar 23, 2022
Maintainer

hhhhjjj Mar 24, 2022
Author