Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for IDL wstring #2124

Merged
merged 18 commits into from
Dec 9, 2024
Merged

Conversation

eboasson
Copy link
Contributor

@eboasson eboasson commented Nov 5, 2024

I really think no-one should be using wstring but ... ROS 2 does have a wide string type, and so ros2/rmw_cyclonedds#501 runs into a problem if Cyclone doesn't do wide strings.

It probably needs more testing, I haven't yet checked how to also do this for Python, and I am still debating whether it wouldn't be better to check that the wide strings received over the network use the surrogate pairs correctly ...

Copy link
Contributor

@dpotman dpotman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks mostly fine, though there are a few missing cases in the IDL compiler and the serializer, and (I believe) an incorrect unit for serialized length of a wstring.

for (uint32_t i = 0; val[i] != L'\0'; i++)
{
assert (di < n);
if ((uint32_t) val[i] < 0x10000)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The (XTypes) spec isn't completely clear on supporting non-BMP characters in wide strings:

The serialized length of an object of type String shall be the number of bytes in the CDR buffer taken by the String characters. This is twice the number of characters in the string because a single character (in the Basic Multilingual Plane) encoded using UTF-16 takes 2 bytes to serialize.

But section 7.2.2.2.1.2.7 on String<Char16> does not mention this restriction, while the sections on Char16 and array/sequence of Char16 do (7.2.2.2.1.2.5 and 7.2.2.2.1.2.6).

And in the rationale in 7.4.1.1.2: ... could support sending UTF-16 encoded Unicode characters outside the BMP

So I'm fine with having no restrictions for writing CDR.

// must always call align for its side effects
if (bound != 0)
{
return align (sample, c, _Alignof (wchar_t), bound + 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the size parameter be bound * sizeof (wchar_t) + sizeof (L'\0')?

*size = sizeof (wchar_t *);
} else {
*align = 1;
*size = bound;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, should also use sizeof (wchar_t) and take terminating \0 into account

case DDS_XTypes_TK_STRING16: {
bool bounded = (type->xt._u.str16.bound > 0);
tb_type->type_code = bounded ? DDS_OP_VAL_BWSTR : DDS_OP_VAL_WSTR;
tb_type->args.string_args.max_size = type->xt._u.str8.bound + 1; // +1 for terminating L'\0'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tb_type->args.string_args.max_size = type->xt._u.str8.bound + 1; // +1 for terminating L'\0'
tb_type->args.string_args.max_size = type->xt._u.str16.bound + 1; // +1 for terminating L'\0'

@@ -415,16 +422,29 @@ static dds_return_t typebuilder_add_type (struct typebuilder_data *tbd, uint32_t
case DDS_XTypes_TK_STRING8: {
bool bounded = (type->xt._u.str8.bound > 0);
tb_type->type_code = bounded ? DDS_OP_VAL_BST : DDS_OP_VAL_STR;
tb_type->args.string_args.max_size = type->xt._u.str8.bound + 1; // +1 for terminating \0
tb_type->args.string_args.max_size = type->xt._u.str8.bound + 1; // +1 for terminating '\0'
tb_type->cdr_align = bounded ? 1 : 4; // unbounded string has 4 bytes length field in cdr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both bounded and unbounded strings have a 4 bytes length in the CDR, so this comment (which I wrote myself a few years ago) is wrong. It looks like the cdr_align field is not used anywhere, and can be removed.

static void dds_stream_skip_wstring (dds_istream_t * __restrict is)
{
const uint32_t length = dds_is_get4 (is);
dds_stream_skip_forward (is, length, 2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assumes length is in 2-byte units (see comment on write_wstring)

assert (maxsz > 0);
if (!read_and_normalize_uint32 (&sz, data, off, size, bswap))
return false;
if ((size - *off) / 2u < sz || maxsz - 1 < sz)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assumes size is in 2-byte units

case DDS_OP_VAL_WSTR: case DDS_OP_VAL_BWSTR: {
uint32_t sz = dds_is_get4 (is);
dds_os_put4 (os, allocator, sz);
dds_os_put_bytes (os, allocator, is->m_buffer + is->m_index, 2u * sz);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment on line 4653

@@ -4439,6 +4865,7 @@ static void dds_stream_read_key_impl (dds_istream_t * __restrict is, char * __re
case DDS_OP_VAL_2BY: *((uint16_t *) dst) = dds_is_get2 (is); break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[lines 4723 and 4805] WCHAR, WSTR and BWSTR cases missing (cast subtype and type to enum so that the compiler shows a warning?)

@@ -4754,7 +5228,7 @@ static bool prtf_simple_array (char * __restrict *buf, size_t * __restrict bufsi
}
break;
case DDS_OP_VAL_BLN: case DDS_OP_VAL_2BY: case DDS_OP_VAL_4BY: case DDS_OP_VAL_8BY:
case DDS_OP_VAL_STR: case DDS_OP_VAL_BST:
case DDS_OP_VAL_STR: case DDS_OP_VAL_BST: case DDS_OP_VAL_WCHAR:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WSTR and BWSTR missing

It used to handle it the same as a DDS_OP_JEQ4 and tried to interpret the offset as a type
code. This seemingly never caused a problem before, until the introduction of wstring.

Signed-off-by: Erik Boasson <[email protected]>
dds_stream_reuse_wstring_bound / wstring_from_utf16 expect a bound including a terminating
0, so should not subtract 1 from bound in instruction.

Signed-off-by: Erik Boasson <[email protected]>
FOREACH moved from dynlib.s, FOREACH_PAIR is analogous.

Signed-off-by: Erik Boasson <[email protected]>
Signed-off-by: Erik Boasson <[email protected]>
@eboasson
Copy link
Contributor Author

eboasson commented Dec 5, 2024

@dpotman I believe the commits I just pushed address all the comments — and a few details extra that I discovered in the process. There are also now some tests for wstring support. Good thing I did that, because that's where 2fd2600 comes from.

Now the problem with the tests is that I disabled them on Windows, or more precisely: when using Microsoft's compiler. I know the general issue, which is that you need an extra round of macro expansion when playing games with __VA_ARGS__. I tried liberally sprinkling DDSRT_COUNT_PAIRS_MSVC_WORKAROUND all over DDSRT_FOREACH_PAIR... and that mostly fixes it, but that leaves a handful of preprocessor errors (of the same kind) that I am seem to unable to fix without a bit of trial-and-error.

I would've done that trial-and-error if only my little old Windows PC would still let me do a remote desktop session ... That leaves the CI, but for this kind of trial-and-error going through the CI is practically infeasible. Maybe some kind soul can help me out here ...

@dpotman
Copy link
Contributor

dpotman commented Dec 5, 2024

I would've done that trial-and-error if only my little old Windows PC would still let me do a remote desktop session ... That leaves the CI, but for this kind of trial-and-error going through the CI is practically infeasible. Maybe some kind soul can help me out here ...

With some trial-and-error, I think I've fixed it! 229733d

Signed-off-by: Dennis Potman <[email protected]>
Copy link
Contributor

@dpotman dpotman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good now!

@eboasson eboasson merged commit 30c079b into eclipse-cyclonedds:master Dec 9, 2024
18 of 21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants