Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Naming of the innerHTML setter (e.g., setInnerHTML.) #100

Closed
mozfreddyb opened this issue Jun 15, 2021 · 33 comments
Closed

Naming of the innerHTML setter (e.g., setInnerHTML.) #100

mozfreddyb opened this issue Jun 15, 2021 · 33 comments

Comments

@mozfreddyb
Copy link
Collaborator

I don't like the naming of setInnerHTML. In the security industry, we said so much about avoid using innerHTML, that library like React use the name called dangerouslySetInnerHTML. It's super hard to tell people now that "Hey! setInnerHTML is safe!". At the very least, it's confusing for developers. setHTML (or setSanitizedHTML) sounds better.

Originally posted by @shhnjk in #42 (comment)

@koto
Copy link

koto commented Jun 15, 2021

If we need a new function to set innerHTML, it should support extensions in the future not to be tied to the sanitizer only. I am not sure if the setInnerHTML(value, {optionBag}) signature is sufficient (or the most ergonomic), but only for one reason, and a speculative one.

I think it might be valuable for the platform in the future to distinguish whether innerHTML value to be set is a literal, or a dynamic value (and e.g. block the dynamic ones). This needs some plumbing in EcmaScript. We've explored that subject in the past (https://github.com/mikewest/tc39-proposal-literals), and it seems like the most viable solution is to rely on template literals and tag functions, i.e. it might only be possible to reject a value if a following syntax is used:

assertThisIsALiteral`i am not dynamic`

If the innerHTML setter signature uses the option bag, it makes it hard to add support for the above - it would have to use something like:

el.setInnerHTML(ignoreMe, {literal: assertThisIsALiteral`the actual value`})

One option to allow for this extension would be to have dedicated setters instead of one, e.g.:

el.setInnerHTML.sanitized(dirty, {sanitizer})
el.setInnerHTML.literal`i am constant`
el.setInnerHTML.unsafe("please block me, CSP!")

That way the argument list, and its options can be freely chosen by the appropriate setter variant. That way it's also probably easier to reason about the code statically.

// cc @mikewest

@mozfreddyb
Copy link
Collaborator Author

I like @shhnjk's point, that an innerHTML-alike name might confuse people who remember that "innerhtml is dangerous".
I'm also very sympathetic to the point you are making (which is also a concern within TypeScript) where you couldn't easily decide whether the optionBag contains a sanitizer or not.

@otherdaniel, and Mario (@cure53) talked about it and in lack of a better term arrived at content, which is already an existing property name for a template element.

So, how about Node.setContent(DOMString input, Sanitizer mySanitizer)?

@koto
Copy link

koto commented Jun 15, 2021

how about Node.setContent(DOMString input, Sanitizer mySanitizer)?

That assumes that the sanitizer will be the only API ever allowed to set the content. How about Node.content.setSanitized(DOMString input, Sanitizer mySanitizer) ?

@mozfreddyb
Copy link
Collaborator Author

what is a bare Node.content supposed to return here?
We thought about an adjective or a describer for the set part, but the sanitizer is non-optional, so it's maybe redundant?
We also thought about setSanitized but then maybe the question is "set sanitized WHAT"? (inner, outer, text, content, ...)

setContentSanitized or setSanitizedContent are also terribly long and boring..

@koto
Copy link

koto commented Jun 15, 2021

what is a bare Node.content supposed to return here?

{setSanitized: Function} ? That way you can feature test all the future extensions of the API.

We thought about an adjective or a describer for the set part, but the sanitizer is non-optional, so it's maybe redundant?

It's only required for the setter that's plumbed to the sanitizer (and there are already requests for other setters e.g. whatwg/dom#912). I suspect there would be setters that have a combination of required and optional arguments - the most ergonomic way of supporting that, I think, is to make each of those variants a separate function with its own signature. We could group them under Node.content or not.

We also thought about setSanitized but then maybe the question is "set sanitized WHAT"? (inner, outer, text, content, ...)

setContentSanitized or setSanitizedContent are also terribly long and boring..

That would work, I think. I have a slight preference towards precise API surface rather than a generic endpoint that can do many things depending on arguments. Predictability is boring, and yet we want the API to intuitively describe what it's doing, no?

@shhnjk
Copy link

shhnjk commented Jun 16, 2021

Isn't setSanitizedHTML better? The innerHTML is bad naming, but HTML isn't.

@mozfreddyb
Copy link
Collaborator Author

Thinking about this, I do like setSanitizedHTML() better than setContent().
However, I understand that having a .content might help exposing other use-cases the DOM is having (or is really going to have soon), that's something we should try to answer. Paging @domenic and @annevk again.

@annevk
Copy link
Collaborator

annevk commented Jun 17, 2021

setHTML() seems nice in that it's both clear it will invoke the HTML parser and it beats innerHTML assignment by one character (being generous toward innerHTML here by omitting spaces):

e.setHTML("test");
e.innerHTML="test";

I'm not sure we need to make it clear it's sanitized. All new parser APIs should be safe-by-default from now on I think.

I'm also not sure I understand @koto's concern. It seems that requiring literals would either be a higher-level switch, a change to the API, or perhaps something you enforce through an option? Any of these would be compatible as far as I can tell. But I'd love to learn more as it does seem like something that would be good to get right from the start.

@mozfreddyb
Copy link
Collaborator Author

the function needs to accept a sanitizer instance @annevk. The sanitizer is not bound to some per-global or per-document state.
so we're either at el.setSanitizedHTML(input, sanitizerObj) or at el.setHTML(input, sanitizerObj).

@annevk
Copy link
Collaborator

annevk commented Jun 17, 2021

Of course, but that's an optional argument right? I'm not sure how that relates to my comment.

@otherdaniel
Copy link
Collaborator

Of course, but that's an optional argument right?

I don't think that can be an optional argument. The point of the API is to give developers an easy-to-use setter without any built-in footguns. A function that gives hard security guarantees if the 2nd parameter is present and guaranteed insecurity if it's missing doesn't really meet that goal. The safe setter - whatever it ends up being called - should either be safe, or fail.

@annevk
Copy link
Collaborator

annevk commented Jun 17, 2021

Why can the sanitizer not have safe defaults? What exactly do you need to configure there for it to work?

@koto
Copy link

koto commented Jun 17, 2021

It seems that requiring literals would either be a higher-level switch, a change to the API, or perhaps something you enforce through an option?

As far as I can tell, the most viable option for the platform APIs to distinguish literals from other values, is if the function accepting the value is a tag template function - which ties our hands w.r.t. to the function signature (it's Function(templateStringsArray, ...argumentsToInterpolate)). We could, for example, do:

 el.setHTML(input, sanitizer)
 // and, in the future
 el.setHTML.literal`this_is_not_injected_for_sure`

I think what's important is that the new setter (and all variants of it) is strictly more secure than innerHTML. Which is why e.g. the sanitizer cannot be optional in a variant that sanitizes (at least in the version put forward, there is no default sanitizer that is always available and called implicitly). And in general I would prefer if there's no assumption that the setHTML setter is only paired with the sanitizer. Sanitizing HTML is not the only way to make it safe (w.r.t. XSS or otherwise).

@otherdaniel
Copy link
Collaborator

Why can the sanitizer not have safe defaults? What exactly do you need to configure there for it to work?

The sanitizer has safe defaults. Was your intent that .SetHTML("xxx") would imply that a default-constructed Santizer instance is applied? That would be viable, as it meets the safe-or-fail goal.

I personally would prefer an explicit instance over an implied one, as I find that less surprising. On the plus side, Element.SetHTML with built-in Sanitizer might increase usage of the Sanitizer.

@annevk
Copy link
Collaborator

annevk commented Jun 17, 2021

Yeah, I was expecting the second argument to merely be some options that have safe defaults, but can be overridden if you need more control over sanitization. I don't think we want everyone that has to parse a string to have to explicitly consider and configure a sanitizer.

@koto given that we cannot distinguish that type from a string it's still unclear to me what that means. And even once we can distinguish it, what it would mean then.

@koto
Copy link

koto commented Jun 17, 2021

@koto given that we cannot distinguish that type from a string it's still unclear to me what that means.

We can, from within the tag template function. The first argument is a frozen array of strings, and the whole array is in the Realm's [[TemplateMap]] slot. https://github.com/tc39/proposal-array-is-template-object proposes to expose this check to regular JS, but even without this, one can distinguish template literals from strings using the above machinery. Again, there's a whole history of how we got there, starting from https://github.com/mikewest/tc39-proposal-literals.

And even once we can distinguish it, what it would mean then.

I think it would be huge for combating injections, if platform was able to distinguish literals from dynamic values, as the first ones are by definition non-user-controllable without a very specific kind of server-side injection.

@shhnjk
Copy link

shhnjk commented Jun 17, 2021

Regarding the naming, setHTML LGTM 😊

given that we cannot distinguish that type from a string it's still unclear to me what that means. And even once we can distinguish it, what it would mean then.

To give more high-level description, Web Platform currently doesn't have a way to say developers are assigning static string or dynamic string.
That is,

elem.innerHTML = "<b>hello</b>";

AND

elem.innerHTML = "<b>" + location.hash.slice(1) + "</b>";

Are both string assignment (even though one is static and one is dynamic). What @koto suggested is to give a primitive to developers, where the platform can understand the notion of static string assignment. This primitive is important in the platform, because if we know something is static, it is safe to append without sanitization.

BTW, elem.setHTML.withLiterals`this_is_not_injected_for_sure` sounds better (remember, this is the thread to decide namings 😋)

@Sora2455
Copy link

I think it would be huge for combating injections, if platform was able to distinguish literals from dynamic values, as the first ones are by definition non-user-controllable without a very specific kind of server-side injection.

Very specific, but very common. (Though if anyone has actual stats for Reflected XSS (I couldn't find any with a cursory search) that would be great).

@koto
Copy link

koto commented Jun 18, 2021

Very specific, but very common.

I realize we're diverging from the thread, but I highly doubt that reflections inside template strings inside a script are common. That would require something like this on the server side:

<script>
foo = `bar<?php echo $_GET['xss'] ?>`
</script>

Additionally, this would only be risky if the injection was not able to escape from the template, but was able to inject HTML, which seems quire rare. Regardless, the best the platform can do to combat this, and other server-side XSSes is CSP script-src.

I don't want this issue to become about the literals, I only want to highlight that other setters might be worthy creating, so let's not just assume that sanitizer-bound setter is the only one to exist, so perhaps (.setHtml.sanitized(input, sanitizer) or el.setSanitizedHTML(input, sanitizer)) are better entry points for it.

@mozfreddyb
Copy link
Collaborator Author

I understand that you might want a function that can be easily used with a tagged template string, @koto, and I think it's really worth discussing but let's please move that to a different issue.

Are there any objections to .setHTML(input, sanitizer)?
Anne says we can consider the sanitizer optional and use the default sanitizer config, which would be very fine.
Note that the default config is promised to be XSS-safe within the boundaries of our Security Considerations, i.e., no protections against gadget XSS.

@koto
Copy link

koto commented Jun 23, 2021

Are there any objections to .setHTML(input, sanitizer)?

Yes. For the reasons outlined, I prefer .setSanitizedHTML or setHTML.sanitize(d) or .html.setSanitized. Creating a setter with a generic name setHTML and hooking it to the sanitizer (explicitly, or implicitly) limits how this could be extended in the future.

@koto
Copy link

koto commented Jun 23, 2021

Looking at the #102, Element.setSanitizedHTML(input, sanitizer?) and Element.setRawHTML would share the same naming scheme. What's the advantage of a generic setHTML?

@annevk
Copy link
Collaborator

annevk commented Jun 23, 2021

Mainly that it's shorter and can then be more easily advertised (possibly together with raw) as the replacement for innerHTML. It would also mean that any future "set HTML" options would not feel out-of-place and would be a natural addition to that API.

@koto
Copy link

koto commented Jun 23, 2021

I see, those are valid advantages. The disadvantage I see is a slight surprise factor - it doesn't just "set" HTML, it's modifying it. Another variant might be HTMLElement.sanitize(input, sanitizer?)? It's the same length as innerHTML=, and its name describes exactly what would happen - with a downside of not being generic.

With the option for setRawHTML it's mostly cosmetic choice at this point - if we want to go with the one generic function and option bags, or a few specific per-purpose functions. I prefer the latter shape, but don't want to block on this.

@otherdaniel
Copy link
Collaborator

Some thoughts:

  • I can live with any of the names.
  • I like SetHTML with an optionally implied Sanitizer from an 'advertising' perspective, in that we can explain it as a better .innerHTML. I find it questionable from the principle-of-least-surprise perspective, since I wouldn't expect a generically named function to perform this fairly specific operation. I'd (slightly) prefer to be explicit, so that if a user wants sanitization, they should have to say so.
  • I'm skeptical on the extensibility. I wonder if there's a list of candidate extensions people have in mind?

@mozfreddyb
Copy link
Collaborator Author

As of now, the only other option that is currently discussed is a parsing option whether to parse a declarative shadow DOM (whatwg/dom#912).

If that parsing option is to become a thing, we can't implement our parsing invocation without taking options for the parser.
Which brings us to whateverTheMethodIsCalled(input, optionsBag) where the optionsBag contains a sanitizer property (e.g, {sanitizer: mySanitizer } (or the default sanitizer will be used).

That said, it would be a bit surprising if someone wants to setHTML according to the declarative shadow DOM use-case only and find their string to be sanitized? I mean, I like an always-secure setter, but I can also see this as very confusing :|

@otherdaniel
Copy link
Collaborator

As of now, the only other option that is currently discussed is a parsing option whether to parse a declarative shadow DOM (whatwg/dom#912).

Thanks, I hadn't seen that. If that indeed becomes a thing, then I presume any parsing Sanitizer function would want the option, in which case it'd be a natural fit for the Sanitizer config. (See also #45.)

That said, it would be a bit surprising if [...]

IMHO, that speaks for explicit-ness in some form. Either by having a required (not optional) Sanitizer argument, or by referencing it in the name. (That said, I acknowledge this is a matter of taste, and other people will see it differently. I can live with it either way.)

@annevk
Copy link
Collaborator

annevk commented Jun 23, 2021

It could be reasonable to support { sanitizer: "unsafe-disable" } at some point if that is something that is needed (perhaps if it then required a literal it wouldn't even have to be named unsafe). What I'm mainly looking for in the innerHTML replacement:

  1. Safe-by-default.
  2. As short or shorter than innerHTML so it doesn't lose out on ergonomics. For something that you need to use often this is more important than it may seem.
  3. Extensible through the second argument in some way. This could be overloading with a dictionary at some point or having a dictionary there from the start. But we shouldn't have the innerHTML replacement constrained in such a way that this is not possible in the future.

@otherdaniel
Copy link
Collaborator

Hmm. If I take Anne's requirements, and my (admittedly rather sketchy) understanding of WebIDL's overloading rules, then we should have two overloads

  • Element.SetHTML(DOMString)
  • Element.SetHTML(DOMString, Sanitizer)

(Merging those by making Sanitizer optional would make it difficult to define future overloads. Although I take it that's a tricky topic.)

The single param version would be whatever the platform considers to be sensible default behaviour; I assume with default sanitization with whatever current or future defaults we pick. Future overloads could offer any combination of additional arguments or different types, since a (non-nullable) interface is distinguishable from pretty much anything else.

@shhnjk
Copy link

shhnjk commented Jun 24, 2021

If we are going to this route, I would prefer to have the default option as literal.

That is:

  • setHTML`<script>alert(1)</script>` (just works because static HTML assignment).
  • setHTML(“<script>alert(1)</script>”, {/*use sanitizer*/ true, /*optional sanitizer Object*/ sanitizer})

This way, the latter object can have room for extension in the future.

@mozfreddyb
Copy link
Collaborator Author

setHTML 🎉

@koto
Copy link

koto commented Jul 13, 2021

+1 to setHTML !

The tag functions have a specific signature (TemplateArray tpl, any... interpolates), that doesn't match the signature of the second variant (string payload, object optionBag). To merge those into a single setHTML, we'd need something like this maybe?

interface mixin InnerHTML {
  [CEReactions] attribute [LegacyNullToEmptyString] DOMString innerHTML;
  undefined setHTML(object|DOMString innerHTML, optional any... arguments);
};

setHTML algorithm would have to branch on the type of the first argument. It's a bit ugly as the typing is mostly lost, but the shape exposed to the webdevs - a single function - is very useful I think. Perhaps even using

undefined setHTML(object|DOMString innerHTML, optional object options);

would work, since we're not expecting any interpolation anyway.

@mozfreddyb
Copy link
Collaborator Author

This issue is merely about the name. The template thing was discussed in #102 ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants