Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate input of overlong UTF-8 sequences #932

Open
jridderbusch opened this issue Apr 3, 2024 · 5 comments
Open

Investigate input of overlong UTF-8 sequences #932

jridderbusch opened this issue Apr 3, 2024 · 5 comments
Assignees
Labels
task/analyze Need for investigation

Comments

@jridderbusch
Copy link
Contributor

jridderbusch commented Apr 3, 2024

Task

Description

Investigate behavior when input contains overlong UTF-8 sequences (check if string validation can be bypassed; should be fine since Java converts all UTF-8 to UTF-16 before exposing it as strings, but not sure if JSON parser reads UTF-8 stream directly)

Stakeholders

@sybereal

Solution Proposal and Work Breakdown

@jridderbusch jridderbusch added the kind/enhancement New feature or request label Apr 3, 2024
@illfixit
Copy link
Collaborator

illfixit commented Apr 5, 2024

We have standard Angular validators for the form fields. They seem to be well tested and handle such symbols correctly.

@sybereal
Copy link
Collaborator

sybereal commented Apr 5, 2024

I believe there may have been a misunderstanding here.

UTF-8's design theoretically allows code points to be represented in different ways. Overlong UTF-8 sequences use more bytes than strictly required, while still decoding to the same code point. For example, the ASCII space character (U+0020) is normally encoded as a single byte 0x20. However, following normal UTF-8 decoding rules, if you decode 0xc0 0xa0, you will also get U+0020 back.12

The concern is that, if software operates directly on UTF-8-encoded strings, such encodings could potentially be used to bypass validation checks. In the above case of the space character, a validation that checks if a certain input does not contain whitespace may naively look only for the byte 0x20, which can cause it to miss certain occurrences if input is not normalized beforehand.

Since this concerns input validation, I believe it is a backend issue, rather than (just) a frontend issue.

Footnotes

  1. https://stackoverflow.com/a/7113150

  2. https://en.wikipedia.org/wiki/UTF-8#Overlong_encodings

@illfixit
Copy link
Collaborator

illfixit commented Apr 5, 2024

I believe there may have been a misunderstanding here.

UTF-8's design theoretically allows code points to be represented in different ways. Overlong UTF-8 sequences use more bytes than strictly required, while still decoding to the same code point. For example, the ASCII space character (U+0020) is normally encoded as a single byte 0x20. However, following normal UTF-8 decoding rules, if you decode 0xc0 0xa0, you will also get U+0020 back.12

The concern is that, if software operates directly on UTF-8-encoded strings, such encodings could potentially be used to bypass validation checks. In the above case of the space character, a validation that checks if a certain input does not contain whitespace may naively look only for the byte 0x20, which can cause it to miss certain occurrences if input is not normalized beforehand.

Since this concerns input validation, I believe it is a backend issue, rather than (just) a frontend issue.

Footnotes

  1. https://stackoverflow.com/a/7113150
  2. https://en.wikipedia.org/wiki/UTF-8#Overlong_encodings

thank you for the information!

@sybereal sybereal changed the title Investigate input of long UTF-8 sequences Investigate input of overlong UTF-8 sequences Apr 8, 2024
@jridderbusch jridderbusch transferred this issue from sovity/authority-portal Apr 8, 2024
@AbdullahMuk AbdullahMuk added the clean-backlog requires backlog cleaning label May 2, 2024
@sybereal sybereal assigned ununhexium and unassigned sybereal May 2, 2024
@ununhexium ununhexium removed the clean-backlog requires backlog cleaning label May 29, 2024
@ununhexium ununhexium transferred this issue from sovity/edc-broker-server-extension May 29, 2024
@SebastianOpriel
Copy link
Member

Is this really an issue of our repo or shall it be addressed in Core EDC? //Cc @efiege

@sybereal
Copy link
Collaborator

Both, since we would have to investigate the behavior of both upstream and our custom code.

@SebastianOpriel SebastianOpriel added task/analyze Need for investigation and removed kind/enhancement New feature or request labels Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
task/analyze Need for investigation
Projects
None yet
Development

No branches or pull requests

6 participants