-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider padding the payload #56
Comments
Note that the payload being padded is encrypted and only processable by the aggregation service. However, if debug mode is enabled, the cleartext is available; so the main backwards compatibility concern comes from developers using that field. To make feature detection simpler, we could update the version field while making this change. |
Exploring the options we have for padding the aggregatable payload. Note that the current encoding is specified here. Append null contributions to a fixed numberOne way to mitigate the issue above is to simply add null contributions (i.e. contributions with a value of zero) to the payload until the total number of contributions is fixed at the maximum.
[Preferred] Append bytes to a fixed lengthWe could instead calculate some maximum size of the payload and pad the CBOR encoding out to that maximum size. Then, when the aggregation service processes the payload, it would first strip out these extra bytes. The calculation for the number of additional bytes needed for this padding depends on the encoding choices made, but we expect to need up to around 500 bytes of padding in the typical case. If we allow setting the maximum number of contributions up to 50, this could go up to around 1.2 kB. The addition of other operations could expand this significantly, but that is out of scope for now. There are a few different options for this padding scheme: ISO/IEC 7816-4One byte of 0x80 is appended. After that, as many 0x00 (null) bytes are appended as needed to reach the desired length.
Extension to ANSI X9.23While we can't use ANSI X9.23 (as it only supports up to 255 bytes of padding), we could extend the concept of ANSI X9.23 to larger padding values. For this, we would add as many 0x00 (null) bytes as needed to reach the desired length. Then, replace the last two bytes with a representation of the count of bytes added as a 16-bit big-endian unsigned integer. (Note that no fewer than two bytes can be used to pad in this case.) This extension would support up to 64 KiB of padding. While this is more than we need currently, we could also consider using the last four bytes to represent a 32-byte integer; this approach would support up to 4 GiB of padding.
Some other encoding scheme? |
Are we proposing to make this change along with a change to the CBOR structure to optimize payload size for normal contributions as well? That might reveal a preferred encoding. Note that the linear scan for |
Yes, we are -- I can try and post some options for encoding optimizations here soon. Although I don't think that should affect the choice of padding (except for needing to keep the constant per-contribution size if go with appending null contributions). Re: detecting 0x80, we'd have to be careful to distinguish 0x80 appearing in the CBOR serialization vs after it. (And, technically, I think we'd also want to scan the remaining bytes in case there's a later 0x80.) In theory, I think it should be possible to detect when the serialization ends as CBOR encodes how long each map/array should be. But we'd need to ensure that the CBOR parsers used do support this (e.g. Chromium's currently treats trailing bytes as errors). |
Padding will cause the size of the payload to increase. To mitigate this increase, I've put together some options for improving the efficiency of the payload encoding. Making both changes at the same time would avoid multiple aggregation service version changes. Analysis of the current encodingThe current encoding is specified here. // CBOR
{
"operation": "histogram",
"data": [{
"bucket": <16-byte big-endian bytestring>,
"value": <4-byte big-endian bytestring>
}, ...]
} The current CBOR encoding uses 27 bytes for an unencrypted payload with no contributions. Encryption adds 48 bytes of overhead (32 bytes for the encapsulated secret and 16 bytes for AEAD overhead), leading to 75 bytes for an encrypted empty payload. Then, each contribution adds an additional 36 bytes. Summary:
Removing repeated stringsThere are a few options for removing the repeated “bucket” and “value” strings. Replace with nested arraysWe can replace each contribution map with a two-member array: // CBOR
{
"operation": "histogram",
"data": [
[<16-byte big-endian bytestring>, <4-byte big-endian bytestring>],
...]
} This reduces the per-contribution variable component to 23 bytes. Summary:
Replace with mapWe can replace the array of contribution maps with a single bucket to value map. Note that this requires contributions to the same bucket to be consolidated. // CBOR
{
"operation": "histogram",
"data": {
<16-byte big-endian bytestring>: <4-byte big-endian bytestring>,
...}
} This reduces the per-contribution variable component to 22 bytes. Summary:
Replace with flat arrayWe can replace the array of contribution maps with an even-length array. This is equivalent to the 'Replace with nested arrays' option, but flattening the nested two-item arrays. // CBOR
{
"operation": "histogram",
"data": [
<16-byte big-endian bytestring>, <4-byte big-endian bytestring>,
... (repeated unnested as necessary)]
} This reduces the per-contribution variable component to 22 bytes. However, it worsens the semantic structure of the encoding. Summary:
Removing top-level stringsWe could also consider reducing the fixed cost, i.e. the size of a payload with no contributions, by removing the “operation” and “data” strings. We could then replace them with an array, e.g.: // CBOR
[
"histogram",
{<16-byte big-endian bytestring>: <4-byte big-endian bytestring>,
...} // or any other option above
] This reduces the fixed size by 15 bytes, i.e. from 27 bytes unencrypted to 12 bytes unencrypted and from 75 bytes encrypted to 60 bytes encrypted. This is a limited enough benefit that it may be preferable to keep the current semantic structure. Summary:
Miscellaneous other changesThese changes don't affect the maximum size of the encoding, but we could consider them while making other changes. Switch value encoding to an integerIf we no longer rely on the encoding of a contribution to be a fixed length, we can consider switching the encoding of the value from a 4-byte big-endian bytestring to a simple integer, e.g.: // CBOR
{
"operation": "histogram",
"data": {
<16-byte big-endian bytestring>: <regular integer>,
...}
} This would not change the maximum size if we want our encoding to support the same 32-bit integer space, but it may be more ergonomic. (If we decide the encoding should not support numbers beyond 2^16 − 1, we could reduce the per-contribution size by 2. However, this would also be possible for the current bytestring encoding by reducing it to 2 bytes.) Removing null contributionsIf we choose a fixed length padding scheme, there is no longer a need for null reports to contain a ‘null’ contribution (i.e. with value = 0). Instead, we could simply use an empty array/map of contributions. This would not change the maximum size, but it would mean the payload semantics would better match the contents. |
Ensures that the payload always has a fixed number of contributions by adding (0,0) contributions. See #56 for more discussion.
Ensures that the payload always has a fixed number of contributions by adding (0,0) contributions. See #56 for more discussion.
Ensures that the payload always has a fixed number of contributions by adding (0,0) contributions. See #56 for more discussion.
While the payload is now padded with null contributions, leaving this open to track the proposal of explicitly padding to a certain number of bytes |
Opened #116 to track improving the payload size efficiency separately. |
See https://github.com/patcg-individual-drafts/private-aggregation-api#padding
The text was updated successfully, but these errors were encountered: