You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @josalhor, thanks for filing this issue with all the details. Our investigation showed that this issue is probably not related to TVAE, as it is possible to replicate this same error with different synthesizer such as Gaussian Copula.
Root Cause
The BinaryDecisionTreeClassifier metric cannot be run on certain combinations of real/synthetic data.
The metric is designed to take the following steps:
Train the ML model using the synthetic data
Test the ML model using the real data
The problem is that the synthetic data may not have full coverage of all the possible categories. For example, assume only 0.1% of the real data had a particular category value such as 'supdup'. It's possible (due to random chance) that none of the the synthetic data has this value. In this case, the Binary Classification algorithm messes up because the value is seen for the first time during testing.
I'm updating the title of this issue to reflect the findings.
I've also started a new feature request in the underlying SDMetrics library: sdv-dev/SDMetrics#515. We can continue our discussion there.
In the meantime, I wonder if any other metric will be suitable for your purposes? (The Binary Classification metrics are listed as "in Beta" by the SDMetrics docs.)
npatki
changed the title
TVAE unkown category
Binary Classification metric fails with unknown category (ValueError)
Nov 13, 2023
Your description of the problem makes a lot of sense and matches my findings.
In the meantime, I wonder if any other metric will be suitable for your purposes? (The Binary Classification metrics are listed as "in Beta" by the SDMetrics docs.)
Actually, I was trying my best to replicate the CTGAN paper results, so I will take a look at the error and try to patch if possible.
I've also started a new feature request in the underlying SDMetrics library: sdv-dev/SDMetrics#515. We can continue our discussion there.
Environment Details
Please indicate the following details about the environment in which you found the bug:
Error Description
I am trying to run this:
This produces the following error:
Steps to reproduce
Just running the above snippet produces the output.
The text was updated successfully, but these errors were encountered: