You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a SAS file that I am loading into Python using the pandas.read_sas function. When I do this, the missing values in my datetime column are loaded in as empty strings ('') rather than the np.nan values that SDV expects. This causes the SDV synthesizer to crash when fitting.
npatki
changed the title
When data is in a SAS file, pandas reads in missing values as empty strings
When data is in an SAS file, pandas reads in missing values as empty strings
Dec 2, 2024
I'm filing this issue on behalf of Slack user.
Environment Details
Error Description
I have a SAS file that I am loading into Python using the
pandas.read_sas
function. When I do this, the missing values in my datetime column are loaded in as empty strings (''
) rather than thenp.nan
values that SDV expects. This causes the SDV synthesizer to crash when fitting.Steps to reproduce
Reading in the data:
Making sure the metadata correctly identifies datetime columns:
However, because the missing values are read in as empty strings, any SDV synthesizer will throw an error when fitting (originating from
validate
):Workaround
To workaround this, just replace the empty strings with np.nan before fitting any SDV synthesizer.
Additional Details
If the original data is in CSV format, then
pd.read_csv
will read in missing values asnp.nan
. This is compatible with SDV.To solve the issue with SAS file format, we need to make a few decisions first:
The text was updated successfully, but these errors were encountered: