fix: [#1615] Fixes problems related to parsing <html>, <head> and <body> using DOMParser #1617

capricorn86 · 2024-11-19T00:46:22Z

No description provided.

…dy> using DOMParser

…body> using DOMParser

OlaviSau · 2024-11-20T08:40:24Z

packages/happy-dom/test/dom-parser/DOMParser.test.ts

 			const newDocument = domParser.parseFromString(DOMParserSVG, 'image/svg+xml');
 			expect(new XMLSerializer().serializeToString(newDocument).replace(/[\s]/gm, '')).toBe(
 				DOMParserSVG.replace(/[\s]/gm, '')
 			);
 		});
+


If you're planning on handling all the cases - then i suggest these as well:

export const DOMParserBODYSibling = '<body><example></example>Example Text</body><body></body>'; export const DOMParserBODYChildren = '<body><body></body><example></example>Example Text</body>'; it('recognises does not duplicate body when BODY is a child', () => { const newDocument = domParser.parseFromString(DOMParserBODYChildren, 'text/html'); expect(newDocument.body.innerHTML).toBe('<example></example>Example Text'); }); it('recognises does not duplicate body when BODY is a sibling', () => { const newDocument = domParser.parseFromString(DOMParserBODYSibling, 'text/html'); expect(newDocument.body.innerHTML).toBe('<example></example>Example Text'); });

packages/happy-dom/src/html-parser/HTMLParser.ts

packages/happy-dom/src/utilities/XMLEncodeUtility.ts

OlaviSau · 2024-12-19T20:37:50Z

packages/happy-dom/src/config/HTMLElementConfig.ts

@@ -128,11 +138,13 @@ export default <
 	},
 	col: {
 		className: 'HTMLTableColElement',
-		contentModel: HTMLElementConfigContentModelEnum.noDescendants
+		contentModel: HTMLElementConfigContentModelEnum.noDescendants,
+		permittedParents: ['colgroup']
 	},
 	colgroup: {
 		className: 'HTMLTableColElement',


The class name is the same as the previous?

packages/happy-dom/src/html-parser/HTMLParser.ts

packages/happy-dom/src/xml-parser/XMLParser.ts

@@ -25,30 +18,31 @@
 * Group 7: End of self closing start tag (e.g. "/>" in "<img/>").
 * Group 8: End of start tag (e.g. ">" in "<div>").
 */
-const MARKUP_REGEXP =
-	/<([^\s/!>?]+)|<\/([^\s/!>?]+)\s*>|<!--([^-]+)-->|<!--([^>]+)>|<!([^>]*)>|<\?([^>]+)>|(\/>)|(>)/gm;
+const MARKUP_REGEXP = /<([^\s/!>?]+)|<\/([^\s/!>?]+)\s*>|(<!--)|(-->)|(<!)|(<\?)|(\/>)|(>)/gm;


The best way to fix the problem is to replace the custom regular expression with a well-tested HTML parser library. This will ensure that all edge cases and variations of HTML syntax are correctly handled, reducing the risk of security vulnerabilities.

To implement this fix, we will:

Install a well-known HTML parser library, such as htmlparser2.

Replace the custom regular expression with the HTML parser library to handle the parsing of HTML content.

fix: [#1615] Fixes problems related to parsing <html>, <head> and <bo…

25053c8

…dy> using DOMParser

capricorn86 linked an issue Nov 19, 2024 that may be closed by this pull request

DOMParser does not recognise BODY and thus creates it twice #1615

Open

github-actions bot assigned capricorn86 Nov 19, 2024

chore: [#1615] Fixes problems related to parsing <html>, <head> and <…

1fb5412

…body> using DOMParser

capricorn86 mentioned this pull request Nov 19, 2024

fix: [#1615] DOMParser creating body twice #1616

Closed

OlaviSau reviewed Nov 20, 2024

View reviewed changes

capricorn86 added 7 commits November 29, 2024 00:58

chore: [#1615] Continues on implementation

7e84f3b

chore: [#1615] Continues on implementation

02c065c

chore: [#1615] Continues on implementation

38b62c6

chore: [#1615] Continues on implementation

62dc2f2

chore: [#1615] Continues on implementation

88cd748

chore: [#1615] Continues on implementation

3c6ce0e

chore: [#1615] Continues on implementation

b5d0b2e

github-advanced-security bot found potential problems Dec 7, 2024

View reviewed changes

packages/happy-dom/src/html-parser/HTMLParser.ts Fixed Show fixed Hide fixed

capricorn86 added 3 commits December 10, 2024 23:39

chore: [#1615] Continues on implementation

c1b1b40

chore: [#1615] Continues on implementation

dbe90fa

chore: [#1615] Continues on implementation

6782f57

github-advanced-security bot found potential problems Dec 13, 2024

View reviewed changes

packages/happy-dom/src/utilities/XMLEncodeUtility.ts Fixed Show fixed Hide fixed

packages/happy-dom/src/utilities/XMLEncodeUtility.ts Fixed Show fixed Hide fixed

chore: [#1615] Continues on implementation

8302176

github-advanced-security bot found potential problems Dec 13, 2024

View reviewed changes

packages/happy-dom/src/utilities/XMLEncodeUtility.ts Fixed Show fixed Hide fixed

packages/happy-dom/src/utilities/XMLEncodeUtility.ts Fixed Show fixed Hide fixed

chore: [#1615] Continues on implementation

2bf2349

github-advanced-security bot found potential problems Dec 17, 2024

View reviewed changes

packages/happy-dom/src/utilities/XMLEncodeUtility.ts Fixed Show fixed Hide fixed

packages/happy-dom/src/utilities/XMLEncodeUtility.ts Fixed Show fixed Hide fixed

packages/happy-dom/src/utilities/XMLEncodeUtility.ts Fixed Show fixed Hide fixed

OlaviSau reviewed Dec 19, 2024

View reviewed changes

chore: [#1615] Continues on implementation

3ff4650

github-advanced-security bot found potential problems Dec 20, 2024

View reviewed changes

packages/happy-dom/src/html-parser/HTMLParser.ts Fixed Show fixed Hide fixed

chore: [#1615] Continues on implementation

700f16e

github-advanced-security bot found potential problems Dec 21, 2024

View reviewed changes

chore: [#1615] Continues on implementation

9407af9

@@ -7,16 +7,32 @@
             import XMLEncodeUtility from '../utilities/XMLEncodeUtility.js';
+            import { Parser } from 'htmlparser2';
-            /**
-             * Markup RegExp.
-             *
-             * Group 1: Beginning of start tag (e.g. "div" in "<div").
-             * Group 2: End tag (e.g. "div" in "</div>").
-             * Group 3: Comment with ending "--" (e.g. " Comment 1 " in "<!-- Comment 1 -->").
-             * Group 4: Comment without ending "--" (e.g. " Comment 1 " in "<!-- Comment 1 >").
-             * Group 5: Exclamation mark comment (e.g. "DOCTYPE html" in "<!DOCTYPE html>").
-             * Group 6: Processing instruction (e.g. "xml version="1.0"?" in "<?xml version="1.0"?>").
-             * Group 7: End of self closing start tag (e.g. "/>" in "<img/>").
-             * Group 8: End of start tag (e.g. ">" in "<div>").
-             */
-            const MARKUP_REGEXP = /<([^\s/!>?]+)|<\/([^\s/!>?]+)\s*>|(<!--)|(-->)|(<!)|(<\?)|(\/>)|(>)/gm;
+            // Removed custom regular expressions
+            // Add a function to parse HTML using htmlparser2
+            function parseHTML(html: string) {
+                const parser = new Parser({
+                    onopentag(name, attribs) {
+                        console.log("Open tag:", name, attribs);
+                    },
+                    ontext(text) {
+                        console.log("Text:", text);
+                    },
+                    onclosetag(tagname) {
+                        console.log("Close tag:", tagname);
+                    },
+                    oncomment(data) {
+                        console.log("Comment:", data);
+                    },
+                    onprocessinginstruction(name, data) {
+                        console.log("Processing instruction:", name, data);
+                    }
+                }, { decodeEntities: true });
+                parser.write(html);
+                parser.end();
+            }
+            // Example usage of parseHTML function
+            const exampleHTML = '<div name="value">Example</div>';
+            parseHTML(exampleHTML);

@@ -80,3 +80,4 @@
             		"webidl-conversions": "^7.0.0",
-            		"whatwg-mimetype": "^3.0.0"
+            		"whatwg-mimetype": "^3.0.0",
+            		"htmlparser2": "^9.1.0"
             	},

Package	Version	Security advisories
htmlparser2 (npm)	9.1.0	None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: [#1615] Fixes problems related to parsing <html>, <head> and <body> using DOMParser #1617

fix: [#1615] Fixes problems related to parsing <html>, <head> and <body> using DOMParser #1617

capricorn86 commented Nov 19, 2024

OlaviSau Nov 20, 2024 •

edited

Loading

OlaviSau Dec 19, 2024

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

fix: [#1615] Fixes problems related to parsing <html>, <head> and <body> using DOMParser #1617

Are you sure you want to change the base?

fix: [#1615] Fixes problems related to parsing <html>, <head> and <body> using DOMParser #1617

Conversation

capricorn86 commented Nov 19, 2024

OlaviSau Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

OlaviSau Dec 19, 2024

Choose a reason for hiding this comment

OlaviSau Nov 20, 2024 •

edited

Loading