You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current unquotePrintable function does not correctly support 4-byte Unicode characters and has issues in parsing multiple concatenated Unicode character sequences, such as =C9=91=E2=8D=BA (ɑ⍺ - 2 bytes, 3 bytes). The function incorrectly parses this input as ɑ���.
To resolve this issue, we need to:
Extend the function to support 4-byte Unicode characters.
Enable the function to correctly handle multiple concatenated Unicode characters.
A possible solution involves using the first byte of the Unicode character to determine the number of bytes it contains, as described in the IBM documentation. We can implement a recursive helper function that takes the entire Unicode sequence, determines the length of the next character based on the first byte, parses the character, and then calls the function recursively for the subsequent characters.
This enhancement will ensure that the unquotePrintable function properly handles various Unicode character sequences, allowing for more accurate parsing and processing of text data.
The text was updated successfully, but these errors were encountered:
joaoaugustogrobe
changed the title
4 bytes unicode characters
Extend unquotePrintable function to support 4-byte Unicode characters and concatenated sequences
Apr 5, 2023
The current
unquotePrintable
function does not correctly support 4-byte Unicode characters and has issues in parsing multiple concatenated Unicode character sequences, such as=C9=91=E2=8D=BA
(ɑ⍺ - 2 bytes, 3 bytes). The function incorrectly parses this input asɑ���
.To resolve this issue, we need to:
A possible solution involves using the first byte of the Unicode character to determine the number of bytes it contains, as described in the IBM documentation. We can implement a recursive helper function that takes the entire Unicode sequence, determines the length of the next character based on the first byte, parses the character, and then calls the function recursively for the subsequent characters.
This enhancement will ensure that the
unquotePrintable
function properly handles various Unicode character sequences, allowing for more accurate parsing and processing of text data.The text was updated successfully, but these errors were encountered: