Add ISO IR 13 and ISO IR 87 to SpecificCharacterSet #444

9enki · 2023-11-25T04:24:43Z

issue #443

9enki · 2023-11-27T02:54:13Z

@Enet4 Hi, thanks for creating issue #443

For Japanese, Chinese or Korean, the Specific Character Set (0008,0005) may have multiple values. https://dicom.nema.org/medical/dicom/current/output/chtml/part05/sect_H.3.html
https://dicom.nema.org/medical/dicom/current/output/chtml/part05/sect_H.3.2.html

In that case, the Patient Name value needs to be decoded in the Specific Character Set for each = separated part. In the current code, it seems that even if there are multiple values in the Specification Character Set, they are decoded by the first element in that set, is my understanding correct?

If so, I would like to fix it in this PR or another PR, could you please advise where I should consider fixing it? I am tracing the code back from the following function
https://github.com/ikneg/dicom-rs/blob/7538e4f16d73e8bf0a36b472e0812f2541879be9/object/src/lib.rs#L189

I guessed that the place where the values are read from the binary data is the following place, so I tried to output the values to the log here with info!
https://github.com/ikneg/dicom-rs/blob/7538e4f16d73e8bf0a36b472e0812f2541879be9/object/src/mem.rs#L1514

2023-11-27T02:26:26.629565Z  INFO dicom_object::mem: next tokne: Ok(PrimitiveValue(Strs(["填塹^些灼=\u{1b}$BCf;3\u{1b}(J^\u{1b}$B9'<#\u{1b}(J "])))

But here it is already of type Strs, so it seems that I need to understand more of the preceding code, but it is stuck here.

9enki · 2023-11-27T03:13:30Z

>>> specification_character_set = ["shift_jis", "iso2022_jp"]
>>> "ﾔﾏﾀﾞ^ﾀﾛｳ".encode(specification_character_set[0]) + "=".encode("utf-8") + "山田^太郎".encode(specification_character_set[1])
b'\xd4\xcf\xc0\xde^\xc0\xdb\xb3=\x1b$B;3ED\x1b(B^\x1b$BB@O:\x1b(B'

As a sample, when written in python, the Patinant Name value seems to be generated like this, so I want to be able to decode the value generated like this as ﾔﾏﾀﾞ^ﾀﾛｳ=山田^太郎.

Enet4 · 2023-11-27T08:55:08Z

Thank you for working on this @ikneg! Could you please add a few sample texts as unit tests? There should be some for other text encodings at the end of the module, so you would just need to follow the pattern there with new data.

9enki · 2023-11-27T09:41:26Z

@Enet4 Thank you for your reply. I have added a test.

9enki · 2023-11-27T09:42:08Z

@Enet4 Hi, thanks for creating issue #443

For Japanese, Chinese or Korean, the Specific Character Set (0008,0005) may have multiple values. https://dicom.nema.org/medical/dicom/current/output/chtml/part05/sect_H.3.html https://dicom.nema.org/medical/dicom/current/output/chtml/part05/sect_H.3.2.html

In that case, the Patient Name value needs to be decoded in the Specific Character Set for each = separated part. In the current code, it seems that even if there are multiple values in the Specification Character Set, they are decoded by the first element in that set, is my understanding correct?

If so, I would like to fix it in this PR or another PR, could you please advise where I should consider fixing it? I am tracing the code back from the following function https://github.com/ikneg/dicom-rs/blob/7538e4f16d73e8bf0a36b472e0812f2541879be9/object/src/lib.rs#L189

I guessed that the place where the values are read from the binary data is the following place, so I tried to output the values to the log here with info! https://github.com/ikneg/dicom-rs/blob/7538e4f16d73e8bf0a36b472e0812f2541879be9/object/src/mem.rs#L1514
2023-11-27T02:26:26.629565Z  INFO dicom_object::mem: next tokne: Ok(PrimitiveValue(Strs(["填塹^些灼=\u{1b}$BCf;3\u{1b}(J^\u{1b}$B9'<#\u{1b}(J "])))
But here it is already of type Strs, so it seems that I need to understand more of the preceding code, but it is stuck here.

Since this specification is complicated, I would like to remove it from the scope of this PR and create another PR to discuss it.

Enet4 · 2024-12-27T09:53:09Z

This has been superseded by #614, but we ought to look into reviving the efforts of #445 eventually.

add IsoIr13 and IsoIr87

fa513c5

Enet4 added A-lib Area: library C-encoding Crate: dicom-encoding labels Nov 27, 2023

Enet4 self-requested a review November 27, 2023 08:53

add test

31b01d0

9enki mentioned this pull request Nov 27, 2023

PS3.5 H3.2 compliant #445

Open

rforsyth mentioned this pull request Dec 13, 2024

Add character sets: Arabic, Greek, Hebrew, Japanese, Thai, Korean #614

Merged

Enet4 closed this Dec 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ISO IR 13 and ISO IR 87 to SpecificCharacterSet #444

Add ISO IR 13 and ISO IR 87 to SpecificCharacterSet #444

9enki commented Nov 25, 2023

9enki commented Nov 27, 2023 •

edited

Loading

9enki commented Nov 27, 2023

Enet4 commented Nov 27, 2023

9enki commented Nov 27, 2023

9enki commented Nov 27, 2023 •

edited

Loading

Enet4 commented Dec 27, 2024

Add ISO IR 13 and ISO IR 87 to SpecificCharacterSet #444

Add ISO IR 13 and ISO IR 87 to SpecificCharacterSet #444

Conversation

9enki commented Nov 25, 2023

9enki commented Nov 27, 2023 • edited Loading

9enki commented Nov 27, 2023

Enet4 commented Nov 27, 2023

9enki commented Nov 27, 2023

9enki commented Nov 27, 2023 • edited Loading

Enet4 commented Dec 27, 2024

9enki commented Nov 27, 2023 •

edited

Loading

9enki commented Nov 27, 2023 •

edited

Loading