Skip to content

Accuracy of identifying pdf table #629

Answered by jsvine
hhhhjjj asked this question in Q&A
Discussion options

You must be logged in to vote

Hi @hhhhjjj, and thanks for your interest in this library. I take your question to mean something like: Has pdfplumber's table-detection algorithm been tested against a benchmark, and evaluated re. whether the tables are extracted correctly? (If that's not your question, please do let me know.)

The short answer is: no, it has not. The longer answer is that pdfplumber's table-detection algorithm does not take a probabilistic approach, but rather a deterministic one. And although it aims to provide utility with just its default settings, it provides the most utility when you customize the detection settings to the particular PDF you are parsing. So although it's theoretically possible to test

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@hhhhjjj
Comment options

Answer selected by hhhhjjj
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants