You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As described in #546, I may want to ignore certain columns in a dataset when running a report (quality or diagnostic). It is not completely intuitive how to do this.
The metadata requires that all columns be described. So you cannot ask a report to ignore a column simply by removing it from the metadata.
It is unclear from the metadata spec which columns will be ignored and which will be used for evaluation
Actual Solution: If you mark a column with an "other" sdtype (not categorical, numerical, datetime, etc.), then SDV will assume it is non-statistical pii and therefore ignore the column. For example, using sdtype 'text' is sufficient to get a report to ignore the column.
Expected behavior
The metadata spec should probably remain as-is, because in the future we may decide to add metrics for specific sdtypes.
However, perhaps the report itself should allow you to specify which columns to ignore?
The text was updated successfully, but these errors were encountered:
Another use case: the visualization phase after a Quality Report is generated.
If a table has a large number of columns, the generated visualizations become hard to interact with and use for insight gathering. This is an example from the loan_applications dataset:
If I want to focus on ~10 columns in the Quality Report, not an easy way to do this natively. Potential solutions here could either manifest as:
ignoring columns in Quality Report
or ignoring columns in viz generation after full Quality Report is generated
Problem Description
As described in #546, I may want to ignore certain columns in a dataset when running a report (quality or diagnostic). It is not completely intuitive how to do this.
Actual Solution: If you mark a column with an "other" sdtype (not categorical, numerical, datetime, etc.), then SDV will assume it is non-statistical pii and therefore ignore the column. For example, using sdtype
'text'
is sufficient to get a report to ignore the column.Expected behavior
The metadata spec should probably remain as-is, because in the future we may decide to add metrics for specific sdtypes.
However, perhaps the report itself should allow you to specify which columns to ignore?
The text was updated successfully, but these errors were encountered: