Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include TabDDPM as a Synthesizer #2315

Open
celsofranssa opened this issue Dec 6, 2024 · 1 comment
Open

Include TabDDPM as a Synthesizer #2315

celsofranssa opened this issue Dec 6, 2024 · 1 comment
Labels
feature request Request for a new feature under discussion Issue is currently being discussed

Comments

@celsofranssa
Copy link

Problem Description

TabDDPM: Modelling Tabular Data with Diffusion Models

Expected behavior

Same as the other synthesizer

Additional context

The TabDDPM paper evaluates a wide set of benchmarks extensively. It demonstrates its superiority over existing SDV GAN/VAE alternatives, which is consistent with the advantage of diffusion models in other fields. Additionally, it shows that TabDDPM is eligible for privacy-oriented setups, where the original data points cannot be publicly shared.

@npatki
Copy link
Contributor

npatki commented Dec 10, 2024

Hi @celsofranssa, nice to meet you and thank you for your request. It is always great to see that our usage and documentation is working well for users :)

One of the reasons it has worked well is because have developed a framework that all our synthesizers must follow. This includes distinct step for data-preprocessing, handling logical constraints, sampling (including conditional sampling), etc. (More on this in our blog post.) As such, it is not always trivial for us to support externally-developed synthesizers, particularly if they differ from our expected framework in some way.

We can certainly keep this issue open as we decide when/how to prioritize.

It demonstrates its superiority over existing SDV GAN/VAE alternatives, which is consistent with the advantage of diffusion models in other fields.

It would be interesting to look into our SDGym library, which provides an easy way to incorporate a custom synthesizer for the purposes of benchmarking. I understand the original paper provided some results too. Our SDGym library is designed to provide a comparison that standardizes the datasets and metrics across all synthesizers.

@npatki npatki added under discussion Issue is currently being discussed and removed new Automatic label applied to new issues labels Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature under discussion Issue is currently being discussed
Projects
None yet
Development

No branches or pull requests

2 participants