Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deserializing a String field inside a flattende struct fails if the field contains a valid integer #344

Open
tyilo opened this issue Dec 13, 2023 · 11 comments

Comments

@tyilo
Copy link

tyilo commented Dec 13, 2023

What version of the csv crate are you using?

1.3.0

Briefly describe the question, bug or feature request.

When deserializing to a String field using serde, csv can handle a field containing 123.
However, when doing the same to a flattened field, csv can no longer deserialize 123 to the String field and I get the error:

Err(
    Error(
        Deserialize {
            pos: Some(
                Position {
                    byte: 24,
                    line: 4,
                    record: 3,
                },
            ),
            err: DeserializeError {
                field: None,
                kind: Message(
                    "invalid type: integer `123`, expected a string",
                ),
            },
        },
    ),
)

Include a complete program demonstrating a problem.

use serde::Deserialize;

#[allow(dead_code)]
#[derive(Debug, Deserialize)]
struct Inner {
    inner_str: String,
}

#[allow(dead_code)]
#[derive(Debug, Deserialize)]
struct Row {
    str: String,

    #[serde(flatten)]
    inner: Inner,
}

fn main() {
    let source = r#"
str,inner_str
A,A
123,A
A,123
    "#
    .trim();

    let mut reader = csv::Reader::from_reader(source.as_bytes());
    for line in reader.deserialize::<Row>() {
        dbg!(&line);
    }
}

What is the observed behavior of the code above?

Output:

[src/main.rs:29] &line = Ok(
    Row {
        str: "A",
        inner: Inner {
            inner_str: "A",
        },
    },
)
[src/main.rs:29] &line = Ok(
    Row {
        str: "123",
        inner: Inner {
            inner_str: "A",
        },
    },
)
[src/main.rs:29] &line = Err(
    Error(
        Deserialize {
            pos: Some(
                Position {
                    byte: 24,
                    line: 4,
                    record: 3,
                },
            ),
            err: DeserializeError {
                field: None,
                kind: Message(
                    "invalid type: integer `123`, expected a string",
                ),
            },
        },
    ),
)

What is the expected or desired behavior of the code above?

Output:

[src/main.rs:29] &line = Ok(
    Row {
        str: "A",
        inner: Inner {
            inner_str: "A",
        },
    },
)
[src/main.rs:29] &line = Ok(
    Row {
        str: "123",
        inner: Inner {
            inner_str: "A",
        },
    },
)
[src/main.rs:29] &line = Ok(
    Row {
        str: "A",
        inner: Inner {
            inner_str: "123",
        },
    },
)
@BurntSushi
Copy link
Owner

This does indeed look like a bug. I don't know when I'll have time to look into it, so patches are welcome if you can find a fix yourself.

I'll note that the presence of serde(flatten) concerns me. IIRC, it required some hacky things to make it work and it may not have been done correctly.

@tyilo
Copy link
Author

tyilo commented Dec 13, 2023

It seems like serde is calling deserialize_string for str, but deserialize_any for inner_str. Is this a bug in serde?

@BurntSushi
Copy link
Owner

I don't know. I don't know much about serde internals. I do know that the serde(flatten) thing is a bit of a tortured feature though. It's one of those things that folks don't have to actually use, but it affords some very nice conveniences. But it just doesn't work in all cases.

I won't have time to look into this any time soon I'm afraid.

@tyilo
Copy link
Author

tyilo commented Dec 13, 2023

Yeah, it seems to be unsupported in serde: serde-rs/serde#1881

That's unfortunate :/

@Anders429
Copy link

I believe the issue here is that CSV is not actually a "self-describing" format, and therefore should not support deserialize_any(). A single element in a CSV does not encode it's type; you can't tell if 12345 is an integer or a string. The deserializer needs to rely on the the type to tell it how to interpret the data, it can't determine it itself.

#[serde(flatten)] is only supported for self-describing formats because it needs to parse the tokens ahead of time using deserialize_any(), since it doesn't know what types the nested fields contain yet. Since the deserialization then simply "guesses" the type, it might guess wrong and you get issues like this.

As far as I can tell, there's no way to actually fix this without fundamentally changing the csv format to be actually self-describing. IMO there shouldn't be a deserialize_any() implementation for this format at all, although that's probably hard to walk back at this point.

See also: serde.rs's documentation for using deserialize_any().

@BurntSushi
Copy link
Owner

@Anders429 That is a very insightful comment, thank you. I don't think I realized myself at all that flatten support depended on deserialize_any.

@Anders429
Copy link

It's definitely surprising, because you could easily implement most #[serde(flatten)] use cases by hand without needing to call deserialize_any(). I found a couple issues on serde's repository that ask questions about this.

Seems like flattening is hard to support for serde's derive macro, because the macro can't know anything about nested types, and the nested types can't be partially deserialized. It's a limitation that makes flattening pretty cursed; it seems that quick-xml has the same type of issue: tafia/quick-xml#286.

@hydra
Copy link

hydra commented Aug 13, 2024

I ran into the exact same issue today, here's my code:

#[derive(Debug, serde::Deserialize)]
#[serde(rename_all(deserialize = "PascalCase"))]
pub struct PartMappingRecord {
    eda: CSVEdaToolValue,

    manufacturer: String,
    mpn: String,

    #[serde(flatten)]
    fields: HashMap<String, String>,
}

the csv file:

"Eda","Manufacturer","Mpn","Name","Value"
"DipTrace","Molex","0479480001","0479480001",""

the error:

Error: Deserializing part mapping record

Caused by:
    CSV deserialize error: record 1 (line: 1, byte: 42): invalid type: integer `479480001`, expected a string

In the above example, the "Mpn" column's value is ok, but the "Name" column, which is flattened into fields is produces an error.

the source:
https://github.com/MakerPnP/makerpnp/blob/master/src/stores/part_mappings.rs#L20

For my use-case, I need to support any number of arbitrarily named csv columns, since each different EDA tool that generates placement CSV files will always have their own column names and the part mappings file has to use the same column names too.

Additionally, the 'line' counter seems to be 0 based, which is confusing to humans, since most text editors and IDEs display line numbers starting from 1, for example:

image

Suggestions on workarounds or solutions greatly appreciated!

@BurntSushi
Copy link
Owner

I think something folks sometimes forget is that serde is an optional convenience feature. You don't actually need to use it! You can use csv::StringRecord and it will handle literally everything and give you infinite flexibility.

@BurntSushi
Copy link
Owner

Additionally, the 'line' counter seems to be 0 based, which is confusing to humans, since most text editors and IDEs display line numbers starting from 1,

It's not. Or it's not intended to be: https://docs.rs/csv/latest/csv/struct.Position.html#method.line

If you are experiencing the opposite, please open a new issue with an MRE.

@hydra
Copy link

hydra commented Aug 14, 2024

I think something folks sometimes forget is that serde is an optional convenience feature. You don't actually need to use it! You can use csv::StringRecord and it will handle literally everything and give you infinite flexibility.

My project uses serde with serde-derive for other serialization tasks so it made sense to use it for CSV files too. I'm just about to start investigating a solution/work-around to this issue and will keep that in mind. thanks for the timely reminder!

Additionally, the 'line' counter seems to be 0 based, which is confusing to humans, since most text editors and IDEs display line numbers starting from 1,

It's not. Or it's not intended to be: https://docs.rs/csv/latest/csv/struct.Position.html#method.line

If you are experiencing the opposite, please open a new issue with an MRE.

ok, I will check again and future errors and will open issues as required.

hydra added a commit to MakerPnP/makerpnp that referenced this issue Aug 14, 2024
* 'flatten' doesn't work if one of the fields contains only integers.
* See BurntSushi/rust-csv#344 (comment)

Example CSV file that causes an issue is as follows:
```
"Eda","Manufacturer","Mpn","Name","Value"
"DipTrace","Molex","0479480001","0479480001",""
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants