Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continuation of #107 UnsupportedEncodingException #147

Open
davecrighton opened this issue May 8, 2024 · 8 comments
Open

Continuation of #107 UnsupportedEncodingException #147

davecrighton opened this issue May 8, 2024 · 8 comments
Assignees
Labels
duplicate This issue or pull request already exists

Comments

@davecrighton
Copy link

Describe the bug

We have seen the same customer come back with a re-occurrence of #107 but this time using different unsupported code page iso-8859-8-i ( https://en.wikipedia.org/wiki/ISO-8859-8-I )

I think @jmehrens comment in #107 about being more flexible with lookups is likely to be a better solution than adding another mapping, perhaps with the option to specify a fallback code page where the name is not recognized.

We're happy to work on a PR for this but it might take some time to get the contributor agreements signed off our side and we would like guidance on the preferred approach. Specifically when building the suggested table based lookup what is the expected source of charset names?

Would it be acceptable to add something like icu4j as a dependency to help with enumerating charset aliases?

To Reproduce
Create a mail with 'Content-Type: text/html; charset="iso-8859-8-i"
Invoke getContent() API on the email and it fails.

Expected behavior
To process the email even if it uses charset="iso-8859-8-i"

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: Linux
  • Browser n/a
  • Version 2.0.3

Smartphone (please complete the following information):

n/a

Mail server:

  • Protocol being used: pop3
  • Vendor/product: outlook
  • Mail service URL: customer onPrem

Additional context
Add any other context about the problem here.

@jmehrens
Copy link
Contributor

jmehrens commented May 8, 2024

This is an old issue from jakartaee/mail-api#302. I think the next steps are to ask openjdk team to provide an evaluation of that openjdk ticket. That will then align the approach for applying a patch here regardless of timelines over at openjdk. There is a lot we can do but I want their input too. I can ask the openjdk team to provide their view.

We can't add icu4j to Angus-mail. We are interested in contributions. You'll have to sign an eclipse agreement. I can answer remaining questions after evaluation of the openjdk ticket.

@jmehrens
Copy link
Contributor

jmehrens commented May 8, 2024

@davecrighton
Copy link
Author

Thanks for the response. We actually encountered this on the IBM JRE (although suspect this may be common code in the class library) so we will try engaging internally to see if we can get any response on that linked defect. I'll keep you posted if I manage to get any traction.

@jmehrens
Copy link
Contributor

jmehrens commented May 8, 2024

Awesome! Thanks. Let me know if you want me to engage on the public openjdk list. I'm thinking if they add an alias to openjdk then angus-mail can do the same approach as PR 107. Otherwise, if we keep having this come up we can look at new approach.

@psawant19
Copy link

Hi @jmehrens, have you already contacted the OpenJDK community regarding ticket JDK-8195686? If so, what was the response from community and we’re curious if there’s a specific reason why this ticket remains unresolved until today.

@jmehrens
Copy link
Contributor

jmehrens commented Jun 8, 2024

Hi @psawant19, the main reason this is unresolved is that I'm continuing the approach that Bill Shannon was taking in the linked ticket. Usually accepted means that there is a comment in the OpenJDK ticket from the team at Oracle that confirms it is a real bug.
I didn't contact OpenJDK on this issue yet. My understanding was that @davecrighton was going to address it through other channels. Given the time that has passed I'll engage on the OpenJDK mailing list. Once there is an answer there I can move forward with this ticket. I'll post back on this issue with OpenJDK thread when I open it.

@jmehrens
Copy link
Contributor

@davecrighton openjdk/jdk#20690 has been opened. If you have any comments then add them to that PR.

@jmehrens jmehrens added the duplicate This issue or pull request already exists label Sep 17, 2024
@jmehrens jmehrens self-assigned this Sep 17, 2024
@jmehrens
Copy link
Contributor

Per OpenJDK (Naoto Sato):

I looked at this issue a bit more. Looking at the IANA Charset registry (https://www.iana.org/assignments/character-sets/character-sets.xhtml) which Charset class is based on, ISO-8859-8-I is not an alias to ISO-8859-8, but it is defined as a distinct Preferred MIME name. So I don't think current proposed solution is correct. (It would return ISO-8859-8-I as an alias to ISO-8859-8). Also, looking at the RFC-1556, in which this ISO-8859-8-I encoding is defined, there are other encodings, i.e., ISO-8859-6-I, ISO-8859-6-E, and ISO-8859-8-E.

Per Wikipedia link above:

The WHATWG Encoding Standard used by HTML5 treats ISO-8859-8 and ISO-8859-8-I as distinct encodings with the same mapping due to influence on the layout direction.

On the OpenJDK side I think the approach would be to create a new Charset/CharsetEncoder/CharsetDecoder for ISO-8859-8-I. Given what Wikipedia states the implementation is already in the OpenJDK. We would literally just have to create the new class names with the same implementation and adjust the aliases so ISO-8859-8-I doesn't alias ISO-8859-8 but actually loads the 'new' implementation.

Adding the alias in AngusMail is not the right approach because once that mapping is present it will override any actual implementation in the OpenJDK including 3rd party implementations. The AngusMail/JakartaMail alias code doesn't pre-screen the charset name to see if an implementation exists in the JDK.

For this ticket recommend following the advice in the FAQ for Why do I get the UnsupportedEncodingException when I invoke getContent() on a bodypart that contains text data?:

  1. Use http://www.freeutils.net/source/jcharset/
  2. Extend CharsetProvider to create a unique charset ISO-8859-8-I that uses ISO-8859-8 internally.

I'll leave this ticket open to think on if there is anything else we can do on this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

3 participants