Replies: 3 comments 7 replies
-
Hello, and interesting example! Right now, |
Beta Was this translation helpful? Give feedback.
-
I have done this sort of analysis in other environments. I don't know the precise details of pdfminer's / pdfplumber's CTM so don't hesitate to correct me! but normally these follow the description in https://en.wikipedia.org/wiki/Transformation_matrix#Rotation. Your image looks like a pure rotation - the normal problem is to find the point about which it is rotated. (Note that signs of rotation may differ in different systems - you may have - and + reversed
(depending on the direction of t) Your problem is to assemble the correct characters on a line and calculate the (new) x coordinates so you can determine the sequence of characters, spaces, etc. You need to determine the points about which the string is rotated. In this case you could rotate the whole diagram by +30 deg (e.g. anticlockwise around the origin) and try to find new strings as if they were aligned along X. It's quite likely that the reading order of the characters is arbitrary so you will need to use geometry to determine which string is which. You can't rely on them being written Hope this is enlightening rather than confusing. May be useful to post a snippet of the raw matrices. |
Beta Was this translation helpful? Give feedback.
-
You are rotating the x0,y0 (and possibly x1, y1) of each character. These
are points annotated/linked to the codepoints. But it's only the
coordinates that are transformed.
I can't give a general answer as you have only shown a small image (I
assume there is more). The key thing is whether all the text is rotated by
the same amount or whether each string is at a different angle. That will
be harder.
…On Tue, May 9, 2023 at 10:18 AM Ritchie Poh ***@***.***> wrote:
I have tried to multiply the 2d rotation matrix. However, I realized I'm
not able to overwrite the original coordinates of the character. I would
also like to know that, if I follow this method, am I rotating the text, or
am I rotating the axes?
—
Reply to this email directly, view it on GitHub
<#875 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFTCS5Y5XFSAP62JX5BV73XFIDUVANCNFSM6AAAAAAXOV5T3U>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
Peter Murray-Rust
Founder ContentMine.org
and
Reader Emeritus in Molecular Informatics
Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
|
Beta Was this translation helpful? Give feedback.
-
I'm currently trying to extract text from a PDF file that contains rotated text. My current code works fine in terms of extracting the correct rotated text. However, the text extracted is not within a line.
An example of the PDF file:
Text extracted
My expected output will be, for example,
Do
in one line,D1
in one line,MIN.
in one line.I think there's a rotation parameter exposed by
pdfminer.six
here at line 38, but I'm not sure if that's used to extract rotated text, and whether can I use frompdfplumber
Beta Was this translation helpful? Give feedback.
All reactions