Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include S3 object content type in attributes #700

Merged
merged 3 commits into from
Sep 11, 2024

Conversation

glenrobson
Copy link
Contributor

The ContentTypeHeaderChecker is invoked after the NameChecker, but the contentType is never set when the attributes are fetched from S3. This patch fixes that.

Imported branch from #676 and added specific imports back.

@glenrobson
Copy link
Contributor Author

Question from previous meeting:

Now this is fixed what happens to the existing behaviour which may work on the file name extension

According to the documentation (and a brief look at the code) it looks like the process is:

 * <ol>
     *     <li>If the object key has a recognized filename extension, the
     *     format is inferred from that.</li>
     *     <li>Otherwise, if the source image's URI identifier has a recognized
     *     filename extension, the format will be inferred from that.</li>
     *     <li>Otherwise, a {@literal GET} request will be sent with a
     *     {@literal Range} header specifying a small range of data from the
     *     beginning of the resource.
     *         <ol>
     *             <li>If a {@literal Content-Type} header is present in the
     *             response, and is specific enough (i.e. not {@literal
     *             application/octet-stream}), a format will be inferred from
     *             that.</li>
     *             <li>Otherwise, a format is inferred from the magic bytes in
     *             the response body.</li>
     *         </ol>
     *     </li>
     * </ol>

So fixing this issue will only change the behaviour if the format can not be inferred from the filename extension.

@glenrobson
Copy link
Contributor Author

Some more information here:

/**
     * <p>Returns an iterator over the results of various techniques of
     * checking the format, in the order of least to most expensive. Any of the
     * calls to {@link Iterator#next()} or may return either an inaccurate
     * value, or {@link Format#UNKNOWN}. Clients should proceed using the first
     * non-unknown format they encounter and, if this turns out to be wrong,
     * iterate and try again.</p>
     *
     * @return Iterator over whatever format-inference strategies the instance
     *         supports. <strong>The instance is cached and the same one is
     *         returned every time.</strong>
     * @since 5.0
     */
    Iterator<Format> getFormatIterator();

@ksclarke ksclarke merged commit 1bffd1f into develop Sep 11, 2024
10 checks passed
@ksclarke ksclarke deleted the bugfix/s3-content-type branch September 11, 2024 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants