Consider adding a custom 'label' to allow more flexible batching #92

alexmturner · 2023-09-08T18:57:30Z

Currently, the aggregation service only allows each 'shared ID' to be present in one query. A set of reports with the same shared ID cannot be split for separate queries, even if the resulting batches are disjoint.

One option to add more flexibility is to support an optional, custom field (a ‘label’) that is factored into the shared ID generation. We could consider a few different options:

Putting the field in the shared_info: The reporting origin would be able to easily split reports into separate batches based on the label. However, this approach would require the label to be set outside the isolated (Shared Storage or Protected Audience) context. It also would require the report to be deterministic similar to the context ID, i.e. sending a null report if no contributions are made. This approach is therefore unlikely to work for Protected Audience bidders (see related discussion) and could increase the number of reports sent.
Putting the field in the payload: This avoids the deterministic report requirement and would allow the label to be based on cross-site data, i.e. set from inside the isolated contexts. But, this also prevents the reporting origin from directly determining the label embedded in the report. The reporting origin may therefore have to send a larger number of reports to the aggregation service and ask it to filter based on a given set of labels. For certain use cases, the reporting origin may be able to maintain a context ID to label mapping that would avoid this increased scale, albeit less ergonomically than above.
Allowing bucket range filtering: Instead of using an explicit label, we could allow filtering based on a range of buckets, with budget only used for that range. This could be more flexible but also increases the complexity of the Aggregation Service’s privacy budgeting implementation.
A combination of the above: We could implement multiple of the above options and allow them to be used together or in different situations.

For all of the above approaches, we’ll also need a mechanism to limit the scale impact on the Privacy Budget Service. For example, we want to prevent developers from specifying a unique ‘label’ per report. There are a few options we could consider, including:

The Aggregation Service could limit the number of labels/bucket ranges or shared IDs per query
We could limit the space of allowed labels/bucket ranges directly, e.g. only allowing integer labels up to a maximum value.

This functionality would also be useful for the Attribution Reporting API, so we may want to align on an approach. (For example, bucket range filtering has been proposed earlier.) Note that Attribution Reporting does not currently support making deterministic reports.

csharrison · 2023-09-10T07:28:41Z

Thanks Alex, I want to note that the context ID / deterministic reports approach is compatible with this related proposal WICG/attribution-reporting-api#974, although it isn't clear all deployments could use that option.

michal-kalisz · 2023-09-20T11:55:01Z

Thank you for proposing this solution. It seems to be very interesting.

I'm wondering how exactly assigning a label to PAA data would look like. Would it be possible to assign a label for each key, value pair separately, or only once per entire auction?

We have several use cases in which we would like to use PAA: machine learning, monitoring, and reporting. For example, we would like to report:

privateAggregation.contributeToHistogram({bucket: key1, value: val1, label: "ml"})
privateAggregation.contributeToHistogram({bucket: key2, value: val2, label: "ml"})
privateAggregation.contributeToHistogram({bucket: key3, value: val3, label: "monitoring"})
privateAggregation.contributeToHistogram({bucket: key4, value: val4, label: "monitoring"})
privateAggregation.contributeToHistogram({bucket: key5, value: val5, label: "reporting"})

This is related to the fact that each of these cases has different requirements:

ML expects a large amount of data with low noise - we would like to wait a few hours for this data and query the Aggregation Service for aggregated results.
Monitoring expects data as quickly as possible to diagnose problems rapidly.
Reporting is in between - it expects data broken down by hours but can wait for them a bit longer.

It seems that this can also be achieved using proposal 3 - "bucket range filtering". However, if a label can be attached per individual histogram, this solution seems more convenient.

kwanmacher · 2023-10-11T23:11:12Z

This is a very interesting proposal, thank you!

The support that will be most useful to us are very similar to what @michal-kalisz described above, but applies to ARA summary reporting rather than PAA. There are several use cases that we have which have different latency requirements and operate on data aggregates that have very different cardinality for the different aggregation keys. For example, a reporting use case has many different breakdowns and can wait longer, while a real time monitoring use case might have much fewer breakdowns but require data to be batched up with minimal latency.

Considering that these different use cases will have their values set under different aggregation keys ("reporting", "monitoring") and they will collectively share the same total L1 budget for the report, it will be great if we can have the "label" attached to each of the aggregation keys (i.e. option 2 + per key label), and have the ability to include the same aggregatable report in multiple summary reports, as long as each query uses a disjoint set of labels.

A secondary optimization (can be built on top) is to go with option 1 and store the set of labels in the shared_info to allow for more efficient batching of reports, but this is more of a nice to have.

Details a proposal for allowing more flexible querying. See #92 for earlier discussion.

alexmturner · 2023-12-15T21:45:09Z

Thanks for all the feedback! We've put up a proposal that we hope satisfies your use cases: https://github.com/patcg-individual-drafts/private-aggregation-api/blob/main/flexible_filtering.md.

Note that we've used different terminology to this issue but the proposal aligns with Option 2 (with a possible extension of adding Option 1 later). This proposal allows a separate label for each contribution within a report. And, while the proposal focuses on Private Aggregation, we plan to explore extending it to Attribution Reporting in a separate GitHub issue.

Specs the ability to set a filtering ID (and modify the default ID space). See https://github.com/patcg-individual-drafts/private-aggregation-api/blob/main/flexible_filtering.md#proposal-filtering-id-in-the-encrypted-payload and issue #92. To support this new functionality, we increase the report version. Note that this also requires aggregation service versions to support the new version.

alexmturner added the enhancement New feature or request label Sep 11, 2023

alexmturner added a commit that referenced this issue Dec 15, 2023

Add more flexible contribution filtering explainer

6b37aa6

Details a proposal for allowing more flexible querying. See #92 for earlier discussion.

alexmturner mentioned this issue Dec 15, 2023

Add more flexible contribution filtering explainer #109

Merged

alexmturner added a commit that referenced this issue Dec 15, 2023

Add more flexible contribution filtering explainer (#109)

66f4bc3

Details a proposal for allowing more flexible querying. See #92 for earlier discussion.

alexmturner mentioned this issue Mar 29, 2024

Spec: filtering IDs #123

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider adding a custom 'label' to allow more flexible batching #92

Consider adding a custom 'label' to allow more flexible batching #92

alexmturner commented Sep 8, 2023

csharrison commented Sep 10, 2023

michal-kalisz commented Sep 20, 2023

kwanmacher commented Oct 11, 2023

alexmturner commented Dec 15, 2023

Consider adding a custom 'label' to allow more flexible batching #92

Consider adding a custom 'label' to allow more flexible batching #92

Comments

alexmturner commented Sep 8, 2023

csharrison commented Sep 10, 2023

michal-kalisz commented Sep 20, 2023

kwanmacher commented Oct 11, 2023

alexmturner commented Dec 15, 2023