Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closer look at inline HTML in Markdown #29

Open
Omikhleia opened this issue Nov 21, 2022 · 0 comments
Open

Closer look at inline HTML in Markdown #29

Omikhleia opened this issue Nov 21, 2022 · 0 comments
Labels
enhancement New feature or request
Milestone

Comments

@Omikhleia
Copy link
Owner

Omikhleia commented Nov 21, 2022

A generalization on #13 with elements of analysis and discussion...

  • blocks elements, except <hr>

    • Pandoc has extensions native_divs and markdown_in_html_blocks, both enabled by default
    • As of yet, these are unsupported extensions in Lunamark
    • Lunamark has writer.display_html which should work as in Pandoc with the above extensions disabled. (Should = I didn't test).
    • This encompasses a lot of things (incl. <table> for instance), most of which we cannot easily render in a satisfying way (even with the help of a 3rd party HTML parsing library, such as htmlparser hinted at in DRAFT: HTML support #18)
  • Block elements, the special case of <hr>

    • As it is supposed not to have any content (in the W3C HTML specification, but browsers try to accept it), maybe we could have a special handling of it...
  • Inline elements, except br and wbr

    • Inline Markdown is valid in them.
    • Pandoc has extension native_spans enabled by default, for <span> elements
    • With it, spans are transformed the equivalent bracketed_spans, respecting the structure (i.e. the content is below the Pandoc.Span element)
      $ pandoc -t json
       <span>content _italic_</span>
      {"pandoc-api-version":[1,22,2],"meta":{},"blocks":[{"t":"Para",  "c":[{"t":"Span","c":[["",[],[]],[{"t":"Str","c":"content"},{"t":"Space"},{"t":"Emph","c":[{"t":"Str","c":"italic"}]}]]}]}]}
      
    • Without it, and for any other inline elements (e.g. <sup> etc.), the HTML is spit out, but flattened. I.e. the structure is lost, one gets at the same level the opening tag, the content and the closing tag
      $ pandoc -t json -f markdown-native_spans
      <span>content _italic_</span>
      {"pandoc-api-version":[1,22,2],"meta":{},"blocks":[{"t":"Para","c":[{"t":"RawInline","c":["html","<span>"]},{"t":"Str","c":"content"},{"t":"Space"},{"t":"Emph","c":[{"t":"Str","c":"italic"}]},{"t":"RawInline","c":["html","</span>"]}]}]}
      
    • Lunamark has writer.inline_html which is technically equivalent to Pandoc's markdown-native_spans
  • Inline elements, the special cas of br and wbr

    • As they are supposed not to have any content (in the W3C HTML specification, but browsers try to accept it), maybe we could have a special handling of them...

Preliminary conclusions

  • Block elements are hard to reach...
  • Inline elements are hard to reach due to their "flattening" losing the hierarchy tree (... and reconstructing it is probably not a very clever approach) = We can't have e.g. <sup> working without much additional logic. The fact that they allow Markdown content also makes the use of an HTML parsing library very clumsy...
@Omikhleia Omikhleia added this to the 3.0 milestone Nov 21, 2022
@Omikhleia Omikhleia added the enhancement New feature or request label Dec 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant