Deserializing a String field inside a flattende struct fails if the field contains a valid integer #344

tyilo · 2023-12-13T13:07:41Z

What version of the `csv` crate are you using?

1.3.0

Briefly describe the question, bug or feature request.

When deserializing to a String field using serde, csv can handle a field containing 123.
However, when doing the same to a flattened field, csv can no longer deserialize 123 to the String field and I get the error:

Err(
    Error(
        Deserialize {
            pos: Some(
                Position {
                    byte: 24,
                    line: 4,
                    record: 3,
                },
            ),
            err: DeserializeError {
                field: None,
                kind: Message(
                    "invalid type: integer `123`, expected a string",
                ),
            },
        },
    ),
)

Include a complete program demonstrating a problem.

use serde::Deserialize;

#[allow(dead_code)]
#[derive(Debug, Deserialize)]
struct Inner {
    inner_str: String,
}

#[allow(dead_code)]
#[derive(Debug, Deserialize)]
struct Row {
    str: String,

    #[serde(flatten)]
    inner: Inner,
}

fn main() {
    let source = r#"
str,inner_str
A,A
123,A
A,123
    "#
    .trim();

    let mut reader = csv::Reader::from_reader(source.as_bytes());
    for line in reader.deserialize::<Row>() {
        dbg!(&line);
    }
}

What is the observed behavior of the code above?

Output:

[src/main.rs:29] &line = Ok(
    Row {
        str: "A",
        inner: Inner {
            inner_str: "A",
        },
    },
)
[src/main.rs:29] &line = Ok(
    Row {
        str: "123",
        inner: Inner {
            inner_str: "A",
        },
    },
)
[src/main.rs:29] &line = Err(
    Error(
        Deserialize {
            pos: Some(
                Position {
                    byte: 24,
                    line: 4,
                    record: 3,
                },
            ),
            err: DeserializeError {
                field: None,
                kind: Message(
                    "invalid type: integer `123`, expected a string",
                ),
            },
        },
    ),
)

What is the expected or desired behavior of the code above?

Output:

[src/main.rs:29] &line = Ok(
    Row {
        str: "A",
        inner: Inner {
            inner_str: "A",
        },
    },
)
[src/main.rs:29] &line = Ok(
    Row {
        str: "123",
        inner: Inner {
            inner_str: "A",
        },
    },
)
[src/main.rs:29] &line = Ok(
    Row {
        str: "A",
        inner: Inner {
            inner_str: "123",
        },
    },
)

The text was updated successfully, but these errors were encountered:

BurntSushi · 2023-12-13T13:30:29Z

This does indeed look like a bug. I don't know when I'll have time to look into it, so patches are welcome if you can find a fix yourself.

I'll note that the presence of serde(flatten) concerns me. IIRC, it required some hacky things to make it work and it may not have been done correctly.

tyilo · 2023-12-13T13:45:05Z

It seems like serde is calling deserialize_string for str, but deserialize_any for inner_str. Is this a bug in serde?

BurntSushi · 2023-12-13T13:49:43Z

I don't know. I don't know much about serde internals. I do know that the serde(flatten) thing is a bit of a tortured feature though. It's one of those things that folks don't have to actually use, but it affords some very nice conveniences. But it just doesn't work in all cases.

I won't have time to look into this any time soon I'm afraid.

tyilo · 2023-12-13T14:38:09Z

Yeah, it seems to be unsupported in serde: serde-rs/serde#1881

That's unfortunate :/

Anders429 · 2024-08-11T01:44:53Z

I believe the issue here is that CSV is not actually a "self-describing" format, and therefore should not support deserialize_any(). A single element in a CSV does not encode it's type; you can't tell if 12345 is an integer or a string. The deserializer needs to rely on the the type to tell it how to interpret the data, it can't determine it itself.

#[serde(flatten)] is only supported for self-describing formats because it needs to parse the tokens ahead of time using deserialize_any(), since it doesn't know what types the nested fields contain yet. Since the deserialization then simply "guesses" the type, it might guess wrong and you get issues like this.

As far as I can tell, there's no way to actually fix this without fundamentally changing the csv format to be actually self-describing. IMO there shouldn't be a deserialize_any() implementation for this format at all, although that's probably hard to walk back at this point.

See also: serde.rs's documentation for using deserialize_any().

BurntSushi · 2024-08-11T01:54:00Z

@Anders429 That is a very insightful comment, thank you. I don't think I realized myself at all that flatten support depended on deserialize_any.

Anders429 · 2024-08-11T02:14:51Z

It's definitely surprising, because you could easily implement most #[serde(flatten)] use cases by hand without needing to call deserialize_any(). I found a couple issues on serde's repository that ask questions about this.

Seems like flattening is hard to support for serde's derive macro, because the macro can't know anything about nested types, and the nested types can't be partially deserialized. It's a limitation that makes flattening pretty cursed; it seems that quick-xml has the same type of issue: tafia/quick-xml#286.

hydra · 2024-08-13T12:24:32Z

I ran into the exact same issue today, here's my code:

#[derive(Debug, serde::Deserialize)]
#[serde(rename_all(deserialize = "PascalCase"))]
pub struct PartMappingRecord {
    eda: CSVEdaToolValue,

    manufacturer: String,
    mpn: String,

    #[serde(flatten)]
    fields: HashMap<String, String>,
}

the csv file:

"Eda","Manufacturer","Mpn","Name","Value"
"DipTrace","Molex","0479480001","0479480001",""

the error:

Error: Deserializing part mapping record

Caused by:
    CSV deserialize error: record 1 (line: 1, byte: 42): invalid type: integer `479480001`, expected a string

In the above example, the "Mpn" column's value is ok, but the "Name" column, which is flattened into fields is produces an error.

the source:
https://github.com/MakerPnP/makerpnp/blob/master/src/stores/part_mappings.rs#L20

For my use-case, I need to support any number of arbitrarily named csv columns, since each different EDA tool that generates placement CSV files will always have their own column names and the part mappings file has to use the same column names too.

Additionally, the 'line' counter seems to be 0 based, which is confusing to humans, since most text editors and IDEs display line numbers starting from 1, for example:

Suggestions on workarounds or solutions greatly appreciated!

BurntSushi · 2024-08-13T12:31:38Z

I think something folks sometimes forget is that serde is an optional convenience feature. You don't actually need to use it! You can use csv::StringRecord and it will handle literally everything and give you infinite flexibility.

BurntSushi · 2024-08-13T12:33:13Z

Additionally, the 'line' counter seems to be 0 based, which is confusing to humans, since most text editors and IDEs display line numbers starting from 1,

It's not. Or it's not intended to be: https://docs.rs/csv/latest/csv/struct.Position.html#method.line

If you are experiencing the opposite, please open a new issue with an MRE.

hydra · 2024-08-14T08:43:41Z

I think something folks sometimes forget is that serde is an optional convenience feature. You don't actually need to use it! You can use csv::StringRecord and it will handle literally everything and give you infinite flexibility.

My project uses serde with serde-derive for other serialization tasks so it made sense to use it for CSV files too. I'm just about to start investigating a solution/work-around to this issue and will keep that in mind. thanks for the timely reminder!

Additionally, the 'line' counter seems to be 0 based, which is confusing to humans, since most text editors and IDEs display line numbers starting from 1,

It's not. Or it's not intended to be: https://docs.rs/csv/latest/csv/struct.Position.html#method.line

If you are experiencing the opposite, please open a new issue with an MRE.

ok, I will check again and future errors and will open issues as required.

* 'flatten' doesn't work if one of the fields contains only integers. * See BurntSushi/rust-csv#344 (comment) Example CSV file that causes an issue is as follows: ``` "Eda","Manufacturer","Mpn","Name","Value" "DipTrace","Molex","0479480001","0479480001","" ```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deserializing a String field inside a flattende struct fails if the field contains a valid integer #344

Deserializing a String field inside a flattende struct fails if the field contains a valid integer #344

tyilo commented Dec 13, 2023

BurntSushi commented Dec 13, 2023

tyilo commented Dec 13, 2023

BurntSushi commented Dec 13, 2023

tyilo commented Dec 13, 2023

Anders429 commented Aug 11, 2024

BurntSushi commented Aug 11, 2024

Anders429 commented Aug 11, 2024

hydra commented Aug 13, 2024

BurntSushi commented Aug 13, 2024

BurntSushi commented Aug 13, 2024

hydra commented Aug 14, 2024

Deserializing a String field inside a flattende struct fails if the field contains a valid integer #344

Deserializing a String field inside a flattende struct fails if the field contains a valid integer #344

Comments

tyilo commented Dec 13, 2023

What version of the csv crate are you using?

Briefly describe the question, bug or feature request.

Include a complete program demonstrating a problem.

What is the observed behavior of the code above?

What is the expected or desired behavior of the code above?

BurntSushi commented Dec 13, 2023

tyilo commented Dec 13, 2023

BurntSushi commented Dec 13, 2023

tyilo commented Dec 13, 2023

Anders429 commented Aug 11, 2024

BurntSushi commented Aug 11, 2024

Anders429 commented Aug 11, 2024

hydra commented Aug 13, 2024

BurntSushi commented Aug 13, 2024

BurntSushi commented Aug 13, 2024

hydra commented Aug 14, 2024

What version of the `csv` crate are you using?