Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Script Enforcement Mechanism to use flags #533

Merged
merged 5 commits into from
Oct 28, 2024

Conversation

lukewarlow
Copy link
Member

@lukewarlow lukewarlow commented Jul 4, 2024

Also add SVGScriptElement to spec

Fixes #483, #517


Preview | Diff

spec/index.bs Outdated

Modify the [=The text insertion mode=] algorithm as follows:
1. If <var ignore=''>parserChange</var> is false, set [=this=]'s [=HTMLScriptElement/is trusted=] to false.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parserChange is a placeholder for what we end up speccing in whatwg/dom#1288

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which issues, besides the one mentioned in #533 (comment), is this PR intended to fix?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#525, and #507 I think should both be closable once this PR is finished alongside the SVG specific one.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When "parserChange" is false and changed by trusted sink is true, couldn't still malicious code have been injected? E.g. if a trusted sink called only someScript.innerText = someScript.innerText that'd make the untrusted code trusted.

Copy link
Member Author

@lukewarlow lukewarlow Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g. if a trusted sink called only someScript.innerText = someScript.innerText that'd make the untrusted code trusted.

That would only work if a default policy had sanctioned that value. Else the assignment would fail before the "changed by trusted sink" Boolean is set

@lukewarlow lukewarlow force-pushed the script-protection-v2 branch from 0c7d33e to f22111f Compare July 4, 2024 14:45
<dt id="scriptEndTag">An end tag whose tag name is "script"</dt>
<dd>
<p>...</p>
1. If [=this=]'s [=HTMLScriptElement/changed by trusted sink=] is true, set [=this=]'s [=HTMLScriptElement/is trusted=] to true.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This flag is used to say hey this is an API change but it's a trusted one. We unset the flag once used.

element</span> <var>script</var>. This might cause some script to execute, which might cause
<span data-x="dom-document-write">new characters to be inserted into the tokenizer</span>, and
might cause the tokenizer to output more tokens, resulting in a [=reentrant invocation of the parser=].</p>
Issue: Need to double check how [part of script element's spec](https://html.spec.whatwg.org/#prepare-the-script-element:~:text=When%20a%20script%20element%20el%20that%20is%20not%20parser%2Dinserted%20experiences) fits into this. These steps need to happen before prepare the script is called.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to change the html spec when upstreaming to run the prepare the script (under relevant conditions) at the end of the children changed steps.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly it would rely on having a bit more information in the children changed steps algorithm if we want to inline it. Because it needs to know what type of change it is (insertion specifically in this case).

I suspect this is why some Chrome and WebKit's childrenChanged functions include more than the dom spec's algorithm. (And is why Firefox implements it in a way that also gives them this more granular informaion).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@domfarolino should probably look at this.

Also would that create issues with re-entrant invocations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would seem that whatwg/html#10188 already changes that part of the HTML spec to be defined in terms of the children changed steps so I think we'd just need to put our new steps first and then run the post-insertion steps and it'll fix the concerns I had here.

Also add SVGScriptElement to spec
@lukewarlow lukewarlow force-pushed the script-protection-v2 branch from f22111f to ac68dd7 Compare July 4, 2024 14:56
@lukewarlow lukewarlow requested a review from annevk July 4, 2024 15:06
spec/index.bs Outdated Show resolved Hide resolved
element</span> <var>script</var>. This might cause some script to execute, which might cause
<span data-x="dom-document-write">new characters to be inserted into the tokenizer</span>, and
might cause the tokenizer to output more tokens, resulting in a [=reentrant invocation of the parser=].</p>
Issue: Need to double check how [part of script element's spec](https://html.spec.whatwg.org/#prepare-the-script-element:~:text=When%20a%20script%20element%20el%20that%20is%20not%20parser%2Dinserted%20experiences) fits into this. These steps need to happen before prepare the script is called.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@domfarolino should probably look at this.

Also would that create issues with re-entrant invocations?

spec/index.bs Outdated Show resolved Hide resolved
@lukewarlow
Copy link
Member Author

lukewarlow commented Jul 9, 2024

@annevk one question I've got with this boolean approach. Does something like .innerText =a/nb cause multiple children changed steps invocations? Wondering if the <br> element generation might somehow mess things up?

Edit: Turns out the bug I found is a pre-existing interop bug

@lukewarlow
Copy link
Member Author

If children changed steps are never called by the parser then I think I can just remove the new boolean and this PR is basically ready to go. I was under the impression they would be called by the parser but I don't think they are.

@domfarolino do you know off the top of your head if that's true? I know you've used them for the post connection steps work youve been doing

lukewarlow added a commit to lukewarlow/WebKit that referenced this pull request Jul 10, 2024
https://bugs.webkit.org/show_bug.cgi?id=276426

Reviewed by NOBODY (OOPS!).

Scripts now have two boolean flags rather than storing a duplicate of their contents.

Spec PR: w3c/trusted-types#533

* LayoutTests/imported/w3c/web-platform-tests/trusted-types/HTMLScriptElement-innerText-expected.txt: Added.
* LayoutTests/imported/w3c/web-platform-tests/trusted-types/HTMLScriptElement-innerText.html: Added.
* LayoutTests/imported/w3c/web-platform-tests/trusted-types/HTMLScriptElement-midparse-appendChild-expected.txt: Added.
* LayoutTests/imported/w3c/web-platform-tests/trusted-types/HTMLScriptElement-midparse-appendChild.html: Added.
* LayoutTests/imported/w3c/web-platform-tests/trusted-types/HTMLScriptElement-midparse-innerHTML-expected.txt: Added.
* LayoutTests/imported/w3c/web-platform-tests/trusted-types/HTMLScriptElement-midparse-innerHTML.html: Added.
* LayoutTests/imported/w3c/web-platform-tests/trusted-types/HTMLScriptElement-midparse-nodeValue-expected.txt: Added.
* LayoutTests/imported/w3c/web-platform-tests/trusted-types/HTMLScriptElement-midparse-nodeValue.html: Added.
* LayoutTests/imported/w3c/web-platform-tests/trusted-types/HTMLScriptElement-src-expected.txt: Renamed from LayoutTests/imported/w3c/web-platform-tests/trusted-types/HTMLScriptElement-internal-slot-expected.txt.
* LayoutTests/imported/w3c/web-platform-tests/trusted-types/HTMLScriptElement-src.html: Renamed from LayoutTests/imported/w3c/web-platform-tests/trusted-types/HTMLScriptElement-internal-slot.html.
* LayoutTests/imported/w3c/web-platform-tests/trusted-types/HTMLScriptElement-text-expected.txt: Added.
* LayoutTests/imported/w3c/web-platform-tests/trusted-types/HTMLScriptElement-text.html: Added.
* LayoutTests/imported/w3c/web-platform-tests/trusted-types/HTMLScriptElement-textContent-expected.txt: Added.
* LayoutTests/imported/w3c/web-platform-tests/trusted-types/HTMLScriptElement-textContent.html: Added.
* LayoutTests/imported/w3c/web-platform-tests/trusted-types/block-Node-multiple-arguments-expected.txt:
* LayoutTests/imported/w3c/web-platform-tests/trusted-types/block-text-node-insertion-into-script-element-expected.txt:
* Source/WebCore/dom/CharacterData.cpp:
(WebCore::canUseSetDataOptimization):
* Source/WebCore/dom/ScriptElement.cpp:
(WebCore::ScriptElement::childrenChanged):
(WebCore::ScriptElement::prepareScript):
(WebCore::ScriptElement::finishParsingChildren): Deleted.
(WebCore::ScriptElement::setTrustedScriptText): Deleted.
* Source/WebCore/dom/ScriptElement.h:
(WebCore::ScriptElement::setChangedByTrustedSink):
* Source/WebCore/html/HTMLScriptElement.cpp:
(WebCore::HTMLScriptElement::setTextContent):
(WebCore::HTMLScriptElement::setInnerText):
(WebCore::HTMLScriptElement::finishParsingChildren): Deleted.
* Source/WebCore/html/HTMLScriptElement.h:
* Source/WebCore/svg/SVGScriptElement.cpp:
(WebCore::SVGScriptElement::finishParsingChildren): Deleted.
* Source/WebCore/svg/SVGScriptElement.h:
@lukewarlow
Copy link
Member Author

I've updated this PR to remove the parserChange boolean because it appears it's not needed. Spec wise the children changed steps seem to never actually get called by the parser. So parserChange is always false (aka it's always an API).

Copy link
Contributor

@fred-wang fred-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukewarlow I guess you are still waiting for a review? I checked the PR and it looks good to me, but I really think someone more familiar with the spec/code should review.

@koto koto merged commit a71f29c into w3c:main Oct 28, 2024
3 checks passed
github-actions bot added a commit that referenced this pull request Oct 28, 2024
SHA: a71f29c
Reason: push, by koto

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Copy link
Member

@domfarolino domfarolino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this has already landed but I took a very belated look and have at least one question

element</span> <var>script</var>. This might cause some script to execute, which might cause
<span data-x="dom-document-write">new characters to be inserted into the tokenizer</span>, and
might cause the tokenizer to output more tokens, resulting in a [=reentrant invocation of the parser=].</p>
Note: This relies on the children changed steps never being called by the parser.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we assume this? I think these steps are called indirectly by the parser, in the usual insertion path, no? And it seems that browsers expect this too: https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/core/html/html_script_element.cc;l=82;drc=d81683e439e0b80cac98e017d94449d59f48fa0d.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the slow response I've been OoO I didn't neccesarily expect this PR to be merged until someone with more knowledge of the status quo had reviewed, apologies for not making that clearer.

So from all I could work out, the specification children changed steps are only called from APIs and NOT the parser.
If that's not true then this will need reverting.
The ChildrenChanged method implemented in Chromium and WebKit are called by the parser but that's because they aren't a 100% match for the spec.

Fwiw I did a quick prototype in Chromium and it seemed to work as expected (it was a few months back at this point so can't fully remember). I've also got a draft PR to update WebKit to use this new model too WebKit/WebKit#30649.

Copy link
Member

@domfarolino domfarolino Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the slow response I've been OoO

Hah, not a problem, my delay since Anne pinged be about this was far worse.

I didn't neccesarily expect this PR to be merged until someone with more knowledge of the status quo had reviewed, apologies for not making that clearer.

Do you think we should revert it?

So from all I could work out, the specification children changed steps are only called from APIs and NOT the parser.
If that's not true then this will need reverting.

Hmm, spec-wise, I think the children changed steps are indeed called from the parser. At the very least, the HTML's adoption agency algorithm creates an element and then immediately appends it to the DOM, which goes through the https://dom.spec.whatwg.org/#concept-node-insert path, which itself calls the children changed steps.

I am really not familiar with the HTML parser, so I can't find any other paths from parser -> DOM insertion hook (which calls children changed steps), but I'm pretty sure they exist. For example, when an iframe initializes its document, it does so only as a result of its HTML insertion steps, and the only thing that calls these steps is the DOM insertion algorithm—the same thing that calls the children changed steps. So the DOM insertion algorithm must be invoked by the parser, or else I think parser-inserted iframes would never get initialized.

@lukewarlow
Copy link
Member Author

Based on the above @koto could we revert this?

Cc @fred-wang

@lukewarlow
Copy link
Member Author

If we do revert this then we still have spec work required to actually finish this which I personally don't have capacity to continue but I'm happy to help others who wish to do that.

@koto
Copy link
Member

koto commented Nov 21, 2024

Yeah, we can revert this.

koto added a commit that referenced this pull request Nov 21, 2024
koto added a commit that referenced this pull request Nov 21, 2024
github-actions bot added a commit that referenced this pull request Nov 21, 2024
SHA: 3819963
Reason: push, by koto

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@domfarolino
Copy link
Member

If we do revert this then we still have spec work required to actually finish this which I personally don't have capacity to continue but I'm happy to help others who wish to do that.

Could we do to the spec, the same thing that you did with Blink and WebKit to make it work with an implementation that calls the children changed steps from the parser? I guess it seems like the work required here might actually be easier than what was originally proposed, if you already got implementations (that match the spec) working properly?

@lukewarlow
Copy link
Member Author

So yeah this comes back to us needing new data passed down the children changed steps to say whether it's a parser or API change. Idk how much work that is to do, and I don't know if it's compatible with a Mozilla implementation (as WebKit and chrome are quite similar in this respect)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SVGScriptElement needs TT protection too
6 participants