-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Harden validation of user-submitted LCP background image URL #1713
Conversation
…follow-up PR" This reverts commit 514c513.
Co-authored-by: felixarntz <[email protected]>
plugins/optimization-detective/tests/test-class-od-url-metrics-group-collection.php
Outdated
Show resolved
Hide resolved
… add/external-bg-preload-validation
@felixarntz This is ready for review although I still have yet to:
I intend to do these Friday morning my time, so hopefully this can be reviewed and merged by mid-day Friday PST in preparation for the Monday release. |
I ran out of time to work on this today, unfortunately. So I'm going to pick it up tomorrow. |
… add/external-bg-preload-validation
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message.
To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@westonruter The actual validation logic in Image Prioritizer looks good, but a few points of feedback on the OD API implementation.
plugins/image-prioritizer/helper.php
Outdated
* @noinspection PhpUnhandledExceptionInspection | ||
*/ | ||
wp_trigger_error( __FUNCTION__, $image_validity->get_error_message() . ' Background image URL: ' . $data['url'] ); | ||
$url_metric->unset( 'lcpElementExternalBackgroundImage' ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is the only unset()
usage in production code? Since this is a very specific use-case, maybe we can find another solution to make it possible, without allowing to unset metric and element data generally.
Could we for instance generate a copy of the given $url_metric
, without the field that is unset? Obviously that's more or less the same thing, but without the need for an extra method just because of this use-case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would work. Such a sanitization filter would be something like this:
$data = $url_metric->jsonSerialize();
unset( $data['lcpElementExternalBackgroundImage'] );
$url_metric = new OD_Strict_URL_Metric( $data );
The downside here is it's going to end up re-validating the data against the schema twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's probably fine. I don't think there's a considerable performance cost with that, and it only happens when sending data (vs e.g. on every page load).
* @param OD_Strict_URL_Metric $url_metric URL Metric, already validated against the JSON Schema. | ||
* @param array<string, mixed> $url_metric_data Original URL Metric data before any mutations. | ||
*/ | ||
$validity = apply_filters( 'od_url_metric_storage_validity', true, $url_metric, $url_metric->jsonSerialize() ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really storage specific? Why not call it just od_url_metric_validity
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rationale for this being storage-specific explained in the filter description. In particular, this extra validation is only applied at the time of storage. It doesn't apply when loading URL Metrics from the od_url_metrics
post type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's quite normal though, to validate and sanitize prior to receiving data. When loading from a post type, it's fair to assume that this data is valid - otherwise it should have never been saved to a post in the first place.
I don't think WordPress commonly validates or sanitizes outside of receiving data, so I think mentioning "storage" seems unnecessary.
plugins/image-prioritizer/helper.php
Outdated
/** | ||
* No WP_Exception is thrown by wp_trigger_error() since E_USER_ERROR is not passed as the error level. | ||
* | ||
* @noinspection PhpUnhandledExceptionInspection | ||
*/ | ||
wp_trigger_error( __FUNCTION__, $image_validity->get_error_message() . ' Background image URL: ' . $data['url'] ); | ||
$url_metric->unset( 'lcpElementExternalBackgroundImage' ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to the unset()
point, this function strictly speaking sanitizes the URL metric, it doesn't just validate it. Given the validity filter name, I would find it more appropriate to throw an exception and call it a day. Validation shouldn't remove invalid properties. Also, since this wp_trigger_error()
would typically happen in the REST API, it would be suppressed even with WP_DEBUG_DISPLAY
, making this harder to spot - so an exception / WP_Error
would be better from that perspective too. Would that not work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking behind unset here is that what if someone using an external background image from another domain not in the allowed list of origins. If any exception it thrown then every single URL Metric will be rejected. This is not good because then none of the other data in the URL Metric would ever be stored. What we want in this case is to just exclude that non-required property from the URL Metric as if it wasn't set in the first place. The use of wp_trigger_error()
is to provide a mechanism for a site owner to find out when WP_DEBUG
is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, but then shouldn't we have a filter for sanitization instead for this purpose, rather than validation?
Given this is for now the only usage of the validation filter, but it's not throwing an exception, I think it would make sense to change this to a sanitization filter that acts on the URL_Metric
instance. This would be similar to e.g. how Core has a sanitization filter for options, but not validation. Obviously we could also add a validation extension point, but that could then happen later and doesn't seem as relevant to this as a sanitization filter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's an alternative to the current validity filter in this PR. We could have a filter that runs on the $data
before it is passed into the OD_Strict_URL_Metric
class for instantiation. So instead of:
performance/plugins/optimization-detective/storage/rest-api.php
Lines 188 to 200 in 6005f4a
try { | |
// The "strict" URL Metric class is being used here to ensure additionalProperties of all objects are disallowed. | |
$url_metric = new OD_Strict_URL_Metric( | |
array_merge( | |
$data, | |
array( | |
// Now supply the readonly args which were omitted from the REST API params due to being `readonly`. | |
'timestamp' => microtime( true ), | |
'uuid' => wp_generate_uuid4(), | |
'etag' => $request->get_param( 'current_etag' ), | |
) | |
) | |
); |
We could have:
try {
$data = array_merge(
$data,
array(
// Now supply the readonly args which were omitted from the REST API params due to being `readonly`.
'timestamp' => microtime( true ),
'uuid' => wp_generate_uuid4(),
'etag' => $request->get_param( 'current_etag' ),
)
);
/**
* Filters the URL Metric data prior to validation for storage.
*
* This can be used to apply additional sanitization to the URL Metric data which
* cannot be done with JSON Schema alone. Additionally, this filter can be used to
* apply validation constraints by throwing an OD_Data_Validation_Exception which
* will block the URL Metric from being stored.
*
* @since n.e.x.t
* @param array<string, mixed> $data URL Metric data. This is the Data type from OD_URL_Metric.
*/
$data = apply_filters( 'od_url_metric_data_pre_storage', $data );
// The "strict" URL Metric class is being used here to ensure additionalProperties of all objects are disallowed.
$url_metric = new OD_Strict_URL_Metric( $data );
In this way, the same filter can be used for both sanitization and validation. Even without explicitly throwing an exception, an exception would end up getting thrown anyway if a filter did something like unset( $data['viewport] )
since that is a required property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@westonruter That looks good to me. Though I think that filter name would be better as od_sanitize_url_metric_data
, as it's a more intuitive name in terms of naming structure, and its primary purpose would be to sanitize data.
Makes sense that this filter could also be used for validation, so the try catch
wrap around it makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think using "sanitize" in the filter name makes sense since other core filters aren't named in that way. It's up to the filter callback to decide whether the purpose is to sanitize. Going the Options API as a model, there is the pre_update_option
filter which is used to (maybe) serialize, how about od_pre_store_url_metric
? Or to follow wp_insert_post()
which has the wp_insert_post_data
filter, what about od_store_url_metric_data
? I like that more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, there there is sanitize_option_{$option_name}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think using "sanitize" in the filter name makes sense since other core filters aren't named in that way.
I don't think that's true, see e.g. sanitize_option_{$option}
.
It's up to the filter callback to decide whether the purpose is to sanitize. Going the Options API as a model, there is the
pre_update_option
filter which is used to (maybe) serialize, how aboutod_pre_store_url_metric
?
What other use-cases would such a filter have than sanitize? You can throw an exception (effectively validate), but technically you can throw an exception from anywhere, so that to me is not necessarily a reason to change the name. The example of serialization wouldn't be relevant here.
Or to follow
wp_insert_post()
which has thewp_insert_post_data
filter, what aboutod_store_url_metric_data
? I like that more.
This would be an okay name to me if we go with filtering the raw data. But that's not as clean of an API surface as filtering the metric instance (per my other comment), so I would prefer not to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What other use-cases would such a filter have than sanitize? You can throw an exception (effectively validate), but technically you can throw an exception from anywhere, so that to me is not necessarily a reason to change the name. The example of serialization wouldn't be relevant here.
A use case could be an extension which amends a stored URL Metric with additional data, like the IP address of the client which submitted the URL Metric.
This would be an okay name to me if we go with filtering the raw data. But that's not as clean of an API surface as filtering the metric instance (per my other comment), so I would prefer not to do that.
I think the API surface is actually cleaner by passing around an array rather then the OD_URL_Metric
instance. See #1713 (comment).
94c5e6b
to
e6cfcfd
Compare
e6cfcfd
to
250f094
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@westonruter This looks close, two follow up comments.
* | ||
* @param array<string, mixed> $data URL Metric data. This is the Data type from OD_URL_Metric. | ||
*/ | ||
$data = apply_filters( 'od_url_metric_data_pre_storage', $data ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #1713 (comment) and #1713 (comment), I think there are better names for such a filter. Two things:
- It's missing a verb. I think
sanitize
would be suitable. - The mentioning of "pre storage" is unnecessary for this kind of sanitization (or validation), and I think it's unnecessarily verbose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've renamed the filter in f74f4f4
$data = apply_filters( 'od_url_metric_data_pre_storage', $data ); | ||
|
||
// The "strict" URL Metric class is being used here to ensure additionalProperties of all objects are disallowed. | ||
$url_metric = new OD_Strict_URL_Metric( $data ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we not keep filtering the OD_URL_Metric
instance for this, as that would be a better filter interface than dealing with raw data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not because then we're passing around an object to the filters, and objects are passed by reference. And any time that you construct an OD_URL_Metric
it applies all of the JSON Schema for validation. It's also not ergonomic to make a change to a URL Metric via a filter:
function ( $url_metric ) {
$data = $url_metric->jsonSerialize();
unset( $data['bad_key'] );
$new_url_metric = new OD_Strict_URL_Metric( $data );
return $new_url_metric;
}
Compared with:
function ( $data ) {
unset( $data['bad_key'] );
return $data;
}
So I think it is better to pass around the URL Metric data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, by passing around an object then the user needs to be aware of the distinction between OD_Strict_URL_Metric
and OD_URL_Metric
, where the former is intended to be an internal class used just in the REST API endpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's only for the specific example of unsetting data though (because there's no unset()
method). Regular sanitization would be as simple as:
function ( $url_metric ) {
$url_metric->set( 'prop', 'sanitized-value' );
return $new_url_metric;
}
If you feel strongly about the ergonomics of this for the scenario of unsetting a property, I'd rather bring back the unset()
method than filtering raw data just because of the lack of an unset()
method.
Honestly, at that point, it would be even better to use an action hook, since the object can be modified directly either way, and an exception can still be thrown too, so a filter is not needed. This would also be safer, as a filter would allow to exchange the entire object which, despite me having suggested that earlier in favor of unset()
, also has drawbacks as normally people shouldn't be doing that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no set()
method either. So you'd have to do the same:
function ( $url_metric ) {
$data = $url_metric->jsonSerialize();
$data['prop'] = 'sanitized-value';
$new_url_metric = new OD_Strict_URL_Metric( $data );
return $new_url_metric;
}
If we were to support set()
then this would bring back concerns about cache invalidation for the group and the collection, and it would be confusing if a user tries to call set()
on a URL Metric from a collection in a tag visitor, as obviously that would not be right since it's only relevant here.
So I think it's preferable for the OD_URL_Metric
object to be immutable, siding with you for suggesting that we don't allow unset()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The filters are the middleware.
Not in the sense that I meant it, they don't allow controlling what you can modify. With an array you can do whatever you want. If we later renamed a property and handle it backwards compatibly, or wanted to trigger a warning if someone changes a specific property, things like that, we can't do that if we use an array - but we can if we use a class instance.
I really think this is too much. I think it is somewhat over-engineered.
Can you clarify why? What's "WordPressy" is a bit subjective, and FWIW some parts of WordPress are not necessarily examples for good engineering.
If we want something more structured in the future I suggest deprecating this filter to instead go with the replacement.
That's a short-term option, though it would not be a good long-term solution IMO per my previous comment. Since this is in the milestones due tomorrow, I'd be okay going with it for now, but it's really not great to ship new API surfaces without sufficient consideration just because we want to ship something fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we later renamed a property and handle it backwards compatibly, or wanted to trigger a warning if someone changes a specific property, things like that, we can't do that if we use an array - but we can if we use a class instance.
That's a good point.
Can you clarify why?
It's subjective 😀
What's "WordPressy" is a bit subjective
I'm not sure its subjective. WordPressy is what aligns with previous established patterns in WordPress.
FWIW some parts of WordPress are not necessarily examples for good engineering.
This is true.
I think there is another way to go about this with an existing filter in the REST API without introducing a new sanitization/validation mechanism: the rest_request_brefore_callbacks
filter. I've never used this filter before, but it is billed as being specifically for performing additional validation. I'm investigating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, check this out: d4c9f40
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, if we can implement this without a new API surface in OD entirely, that would simplify things, at least for now :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that looks great and elegantly avoids expanding the new API. So obviously this is a workaround now, but it gives us time to think whether and how we would want to make this more intuitively possible.
Co-authored-by: swissspidy <[email protected]>
Co-authored-by: swissspidy <[email protected]>
…quest_before_callbacks
This is using get_json_params() instead of get_params() because any URL query parameter would then be included among the params, so by just getting the params from the JSON body we avoid potential failures due to additional query parameters being injected into the request URL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the iterations @westonruter!
plugins/image-prioritizer/helper.php
Outdated
// The strtolower() is due to \WP_REST_Server::match_request_to_handler() using case-insensitive pattern match. | ||
OD_REST_API_NAMESPACE . OD_URL_METRICS_ROUTE !== trim( strtolower( trim( $request->get_route(), '/' ) ) ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By chance I had the thought of checking whether the route was case sensitive. It wasn't. So this would have been a vulnerability. I want to double check how $request->get_route()
gets populated to see if there could potentially be any other ways to make a request to the endpoint and yet bypass this condition here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I found that if you tried making a request without pretty permalinks and added %0A
to the end of the rest_route
query parameter:
http://localhost:8888/index.php?rest_route=/optimization-detective/v1/url-metrics:store%0A
Then the route is parsed as "/optimization-detective/v1/url-metrics:store\n"
and this matches here in \WP_REST_Server::match_request_to_handler()
on this line:
$match = preg_match( '@^' . $route . '$@i', $path, $matches );
This is because $
matches a newline. In reality this should be using \z
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a test case to explicitly check for this: 5ab7fd1
b31d894
to
5ab7fd1
Compare
Fixes #1584
Accepting the LCP background image URL from the untrusted client is dangerous as it could lead to abuse. The JSON Schema to enforce the validity of the URL Schema with such a background image URL is not robust enough to ensure that the URL is safe and that it actually references a valid image file. Therefore, a new
od_url_metric_data_pre_storage
filter is introduced in the REST API endpoint. The URL Metric data is passed into this filter, allowing plugins to further sanitize the data prior to storage. Alternatively, if the data cannot be sanitized due to it being invalid, an exception can be thrown (ideally anOD_Data_Validation_Exception
) which blocks the URL Metric from being stored.