Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long list lines do not wrap #13

Open
scumop opened this issue Jul 7, 2011 · 1 comment · May be fixed by #118
Open

Long list lines do not wrap #13

scumop opened this issue Jul 7, 2011 · 1 comment · May be fixed by #118

Comments

@scumop
Copy link

scumop commented Jul 7, 2011

I noticed an

<li>with 200 characters</li> 

outputs a 200 character long line.
I found this irritating, so added some code in v3.02 method optwrap(text)

Just a fragment

WAS:

for para in text.split("\n"):
    if len(para) > 0:
        if para[0] != ' ' and para[0] != '-' and para[0] != '*':
            for line in wrap(para, BODY_WIDTH):
                result += line + "\n"
            result += "\n"
            newlines = 2
        else:
            if not onlywhite(para):
                result += para + "\n"
                newlines = 1

IS:

reList = re.compile('(^[ ]+[0-9]+\. )|(^[ ]+\* )')
for para in text.split("\n"):
    if len(para) > 0:
        if para[0] != ' ' and para[0] != '-' and para[0] != '*':
            for line in wrap(para, BODY_WIDTH):
                result += line + "\n"
            result += "\n"
            newlines = 2
        else:
            # Handle list item - split lines with indent under. 
            if reList.match( para ):
                indent = False
                indent_spaces = ''
                for line in wrap(para, BODY_WIDTH - 6): # -allowance for indentation pad
                    if False == indent:
                        indent = True
                        result += line + "\n"
                        # Find length to start of text for indent spacing
                        lst = reList.search(line).group()
                        indent_spaces =  ' ' * len(lst)
                    else:
                      result += indent_spaces + line + "\n"
                result += "\n"
                newlines = 1
            elif not onlywhite(para):
                result += para + "\n"
                newlines = 1
@aaronsw
Copy link
Owner

aaronsw commented Jul 8, 2011

Can you submit this as a pull request?

stefanor pushed a commit to stefanor/html2text that referenced this issue Oct 18, 2014
mkllnk added a commit to mkllnk/html2text that referenced this issue Mar 11, 2019
Paragraphs are usually wrapped at 78 characters per line. This patch
applies that to list items as well. It contains elements from scumop who
posted aaronsw#13 (comment).
But it has been rewritten to fix the amount of newline characters and
increase readability and performance.
@mkllnk mkllnk linked a pull request Mar 11, 2019 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants