Skip to content

Commit

Permalink
Merge pull request #3727 from handrews/encodings
Browse files Browse the repository at this point in the history
Clarify how to model binary data in 3.1
  • Loading branch information
miqui authored Apr 28, 2024
2 parents ddbd53f + 8de5a93 commit 5e48c67
Showing 1 changed file with 60 additions and 16 deletions.
76 changes: 60 additions & 16 deletions versions/3.1.1.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,40 @@ The formats defined by the OAS are:
`number` | `double` | |
`string` | `password` | A hint to obscure the value.

#### <a name="binaryData"></a>Working With Binary Data

The OAS can describe either _raw_ or _encoded_ binary data.

* **raw binary** is used where unencoded binary data is allowed, such as when sending a binary payload as the entire HTTP message body, or as part of a `multipart/*` payload that allows binary parts
* **encoded binary** is used where binary data is embedded in a text-only format such as `application/json` or `application/x-www-form-urlencoded` (either as a message body or in the URL query string).

In the following table showing how to use Schema Object keywords for binary data, we use `image/png` as an example binary media type. Any binary media type, including `application/octet-stream`, is sufficient to indicate binary content.

Keyword | Raw | Encoded | Comments
------- | --- | ------- | --------
`type` | _omit_ | `string` | raw binary is [outside of `type`](https://datatracker.ietf.org/doc/html/draft-bhutton-json-schema-00#section-4.2.3)
`contentMediaType` | `image/png` | `image/png` | can sometimes be omitted if redundant (see below)
`contentEncoding` | _omit_ | `base64`&nbsp;or&nbsp;`base64url` | other encodings are [allowed](https://datatracker.ietf.org/doc/html/draft-bhutton-json-schema-validation-00#section-8.3)

Note that the encoding indicated by `contentEncoding`, which inflates the size of data in order to represent it as 7-bit ASCII text, is unrelated to HTTP's `Content-Encoding` header, which indicates whether and how a message body has been compressed and is applied after all content serialization described in this section has occurred. Since HTTP allows unencoded binary message bodies, there is no standardized HTTP header for indicating base64 or similar encoding of an entire message body.

Using a `contentEncoding` of `base64url` ensures that URL encoding (as required in the query string and in message bodies of type `application/x-www-form-urlencoded`) does not need to further encode any part of the already-encoded binary data.

The `contentMediaType` keyword is redundant if the media type is already set:

* as the key for a [`MediaType Object`](#mediaTypeObject)
* in the `contentType` field of an [`Encoding Object`](#encodingObject)

If the Schema Object will be processed by a non-OAS-aware JSON Schema implementation, it may be useful to include `contentMediaType` even if it is redundant. However, if `contentMediaType` contradicts a relevant Media Type Object or Encoding Object, then `contentMediaType` SHALL be ignored.

The following table shows how to migrate from OAS 3.0 binary data descriptions, continuing to use `image/png` as the example binary media type:

OAS < 3.1 | OAS 3.1 | Comments
--------- | ------- | --------
`type: string`<br />`format: binary` | `contentMediaType: image/png` | if redundant, can be omitted, often resulting in an empty Schema Object
`type: string`<br />`format: byte` | `type: string`<br />`contentMediaType: image/png`<br />`contentEncoding: base64` | note that `base64url` can be used to avoid re-encoding the base64 string to be URL-safe


### <a name="richText"></a>Rich Text Formatting
Throughout the specification `description` fields are noted as supporting CommonMark markdown formatting.
Where OpenAPI tooling renders rich text it MUST support, at a minimum, markdown syntax as described by [CommonMark 0.27](https://spec.commonmark.org/0.27/). Tooling MAY choose to ignore some CommonMark features to address security concerns.
Expand Down Expand Up @@ -1458,9 +1492,7 @@ application/json:

In contrast with the 2.0 specification, `file` input/output content in OpenAPI is described with the same semantics as any other schema type.

In contrast with the 3.0 specification, the `format` keyword has no effect on the content-encoding of the schema. JSON Schema offers a `contentEncoding` keyword, which may be used to specify the `Content-Encoding` for the schema. The `contentEncoding` keyword supports all encodings defined in [RFC4648](https://tools.ietf.org/html/rfc4648), including "base64" and "base64url", as well as "quoted-printable" from [RFC2045](https://tools.ietf.org/html/rfc2045#section-6.7). The encoding specified by the `contentEncoding` keyword is independent of an encoding specified by the `Content-Type` header in the request or response or metadata of a multipart body -- when both are present, the encoding specified in the `contentEncoding` is applied first and then the encoding specified in the `Content-Type` header.

JSON Schema also offers a `contentMediaType` keyword. However, when the media type is already specified by the Media Type Object's key, or by the `contentType` field of an [Encoding Object](#encodingObject), the `contentMediaType` keyword SHALL be ignored if present.
In contrast with the 3.0 specification, the `format` keyword has no effect on the content-encoding of the schema. Instead, JSON Schema's `contentEncoding` and `contentMediaType` keywords are used. See [Working With Binary Data](#binaryData) for how to model various scenarios with these keywords, and how to migrate from the previous `format` usage.

Examples:

Expand All @@ -1478,19 +1510,6 @@ content:
application/octet-stream: {}
```

Binary content transferred with base64 encoding:

```yaml
content:
image/png:
schema:
type: string
contentMediaType: image/png
contentEncoding: base64
```

Note that the `Content-Type` remains `image/png`, describing the semantics of the payload. The JSON Schema `type` and `contentEncoding` fields explain that the payload is transferred as text. The JSON Schema `contentMediaType` is technically redundant, but can be used by JSON Schema tools that may not be aware of the OpenAPI context.

These examples apply to either input payloads of file uploads or response payloads.

A `requestBody` for submitting a file in a `POST` operation may look like the following example:
Expand Down Expand Up @@ -1567,6 +1586,8 @@ When passing in `multipart` types, boundaries MAY be used to separate sections o

Per the JSON Schema specification, `contentMediaType` without `contentEncoding` present is treated as if `contentEncoding: identity` were present. While useful for embedding text documents such as `text/html` into JSON strings, it is not useful for a `multipart/form-data` part, as it just causes the document to be treated as `text/plain` instead of its actual media type. Use the Encoding Object without `contentMediaType` if no `contentEncoding` is required.

Note that only `multipart/*` media types with named parts can be described as shown here. Note also that while `multipart/form-data` originally defined a per-part `Content-Transfer-Encoding` header that could indicate base64 encoding (`contentEncoding: base64`), it has been deprecated for use with HTTP as of [RFC7578](https://www.rfc-editor.org/rfc/rfc7578#section-4.7).

Examples:

```yaml
Expand Down Expand Up @@ -1620,6 +1641,8 @@ This object MAY be extended with [Specification Extensions](#specificationExtens

##### Encoding Object Example

`multipart/form-data` allows for binary parts:

```yaml
requestBody:
content:
Expand Down Expand Up @@ -1655,6 +1678,27 @@ requestBody:
type: integer
```

`application/x-www-form-urlencoded` is a text format, which requires base64-encoding any binary data:

```YAML
requestBody:
content:
application/x-www-form-urlencoded:
schema:
type: object
properties:
name:
type: string
icon:
# default for type string is text/plain, need to declare
# the appropriate contentType in the Encoding Object
type: string
contentEncoding: base64url
encoding:
icon:
contentType: image/png, image/jpeg
```

#### <a name="responsesObject"></a>Responses Object

A container for the expected responses of an operation.
Expand Down

0 comments on commit 5e48c67

Please sign in to comment.