Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JAXB encoding implementation change between Java 8 and 11 #1818

Open
jarrettchisholm opened this issue Oct 11, 2024 · 0 comments
Open

JAXB encoding implementation change between Java 8 and 11 #1818

jarrettchisholm opened this issue Oct 11, 2024 · 0 comments

Comments

@jarrettchisholm
Copy link

Hello,

We are currently noticing an issue between our old Java 8 code and our newer Java 11 code around encoding control characters when marshaling an object to xml. Specifically, the tab control character ('\t').

The issue seems to arise when we encode a tab character as part of an attribute.

For a sample object, if we try to add the text thecomment\r\n\tthecomment as a value and as an attribute, Java 8 produces something like this:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?><testClass attribute="thecomment&#xD;&#xA;&#x9;thecomment"><testElement>thecomment&#xD; thecomment</testElement></testClass>

However, Java 11 produces something like this:

"<?xml version="1.0" encoding="UTF-8" standalone="yes"?><testClass attribute="thecomment&#13;&#10; thecomment"><testElement>thecomment&#13;
thecomment</testElement></testClass>"

For the testClass node, the attribute attribute has the tab character encoded in Java 8 but not encoded in Java 11. We are also noticing this happening with Java 21.

I'm noticing that this issue occurs with com.sun.xml.bind:jaxb-impl:2.3.2 and com.sun.xml.bind:jaxb-core:2.3.0. However, if I change the version to com.sun.xml.bind:jaxb-impl:2.2.11 and com.sun.xml.bind:jaxb-core:2.2.11, the problem seems to not be present.

Looking through the source code, it looks like the commit that introduced this issue is 9df41fb9bd588294c9e2aa076ad382d5160128a9 (javaee/jaxb-v2@9df41fb).

It seems like it was a fix for another bug (Bug 25348784?). Possibly related:

https://bugs.openjdk.org/browse/JDK-8172297?jql=labels%20%3D%20bugdb_25348784
https://bugs.openjdk.org/browse/JDK-8176508

Does this seem correct? I would have thought the tab character would still be getting escaped, however the implementation for MinimumEscapeHandler does not seem to handle the tab character (this was the escape handler that was being used in our case).

I have attached a zip file with the entire minimal reproducible example using Java 11. You can switch back and forth between JAXB library versions to toggle the issue on and off.

java11.zip

Cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant