Include S3 object content type in attributes #700

glenrobson · 2024-09-11T13:45:45Z

The ContentTypeHeaderChecker is invoked after the NameChecker, but the contentType is never set when the attributes are fetched from S3. This patch fixes that.

Imported branch from #676 and added specific imports back.

glenrobson · 2024-09-11T13:55:11Z

Question from previous meeting:

Now this is fixed what happens to the existing behaviour which may work on the file name extension

According to the documentation (and a brief look at the code) it looks like the process is:

 * <ol>
     *     <li>If the object key has a recognized filename extension, the
     *     format is inferred from that.</li>
     *     <li>Otherwise, if the source image's URI identifier has a recognized
     *     filename extension, the format will be inferred from that.</li>
     *     <li>Otherwise, a {@literal GET} request will be sent with a
     *     {@literal Range} header specifying a small range of data from the
     *     beginning of the resource.
     *         <ol>
     *             <li>If a {@literal Content-Type} header is present in the
     *             response, and is specific enough (i.e. not {@literal
     *             application/octet-stream}), a format will be inferred from
     *             that.</li>
     *             <li>Otherwise, a format is inferred from the magic bytes in
     *             the response body.</li>
     *         </ol>
     *     </li>
     * </ol>

So fixing this issue will only change the behaviour if the format can not be inferred from the filename extension.

glenrobson · 2024-09-11T14:21:34Z

Some more information here:

cantaloupe/src/main/java/edu/illinois/library/cantaloupe/source/Source.java

Line 71 in f492eac

/**

/**
     * <p>Returns an iterator over the results of various techniques of
     * checking the format, in the order of least to most expensive. Any of the
     * calls to {@link Iterator#next()} or may return either an inaccurate
     * value, or {@link Format#UNKNOWN}. Clients should proceed using the first
     * non-unknown format they encounter and, if this turns out to be wrong,
     * iterate and try again.</p>
     *
     * @return Iterator over whatever format-inference strategies the instance
     *         supports. <strong>The instance is cached and the same one is
     *         returned every time.</strong>
     * @since 5.0
     */
    Iterator<Format> getFormatIterator();

garyttierney and others added 2 commits August 15, 2024 01:37

Include S3 object content type in attributes

155c3e4

Adding specific imports back

0cc9057

glenrobson mentioned this pull request Sep 11, 2024

Include S3 object content type in attributes #676

Closed

Merge branch 'develop' into bugfix/s3-content-type

08ff26a

ksclarke approved these changes Sep 11, 2024

View reviewed changes

ksclarke merged commit 1bffd1f into develop Sep 11, 2024
10 checks passed

ksclarke deleted the bugfix/s3-content-type branch September 11, 2024 15:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include S3 object content type in attributes #700

Include S3 object content type in attributes #700

glenrobson commented Sep 11, 2024

glenrobson commented Sep 11, 2024

glenrobson commented Sep 11, 2024

Include S3 object content type in attributes #700

Include S3 object content type in attributes #700

Conversation

glenrobson commented Sep 11, 2024

glenrobson commented Sep 11, 2024

glenrobson commented Sep 11, 2024