[Gretel] Optimize config and training parameters for gretel-synthetics #142

zredlined · 2022-05-13T17:16:32Z

Problem Description

Some of the parameters in the gretel-synthetics implementation in SDGym can cause the model to fail during evaluation, and can be optimized for generating synthetic data (details below).

Expected behavior

In sdgym/synthesizers/gretel.py there are a few updates I'd recommend making:

Add learning_rate as a parameter, set default to 0.001 as per Gretel docs.
Add field_cluster_size as a tunable parameter
on batcher.generate_all_batch_lines, set a default max_invalid. For larger datasets, the default value of 1000 can cause the model to unnecessarily terminate and during sampling.
epochs can be set to a standard value (e.g. 100), no need to set epochs based on the number of columns. Early stopping and a validation set can be used to prevent overfitting.

Additional context

I'm happy to submit a PR with fixes and to compare against baseline config for tests, let me know if this would be okay. Cheers!

The text was updated successfully, but these errors were encountered:

katxiao · 2022-05-18T20:18:18Z

Hi @zredlined, thanks for taking a look at the Gretel implementation in SDGym! Please feel free to submit a PR with the updates you have proposed. You can link it to this issue and we'll be happy to take a look.

zredlined added feature request Request for a new feature under discussion Issue is currently being discussed labels May 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Gretel] Optimize config and training parameters for gretel-synthetics #142

[Gretel] Optimize config and training parameters for gretel-synthetics #142

zredlined commented May 13, 2022

katxiao commented May 18, 2022

[Gretel] Optimize config and training parameters for gretel-synthetics #142

[Gretel] Optimize config and training parameters for gretel-synthetics #142

Comments

zredlined commented May 13, 2022

Problem Description

Expected behavior

Additional context

katxiao commented May 18, 2022