Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to use a custom synthesizer with a timeout on a Colab Notebook #368

Open
npatki opened this issue Dec 11, 2024 · 0 comments
Open
Labels
bug Something isn't working

Comments

@npatki
Copy link

npatki commented Dec 11, 2024

Environment Details

  • SDGym version: 0.9.1 (latest)
  • Operating System: Linux, environment is a Colab Notebook

Error Description

In a Colab Notebook environment, I am unable to get results for custom synthesizers if I supply a timeout value. The synthesizer shows up in the results DataFrame, but all the associated values for it are NaN (even the ones for dataset size, initialization, etc.). All of the other, pre-defined synthesizers produce values.

This problem goes away if I remove the time out value, or run the script on my local machine instead. So it is the combination of following that is causing the issue:

  • (a) running on a Colab notebook (or likely any interactive environment), and
  • (b) adding a custom synthesizer to the benchmark, and
  • (c) adding a timeout

Steps to reproduce

The code below creates a custom synthesizer that is just a variant of GaussianCopula (setting marginals to uniform). Then it tries to run the benchmark for it.

import sdgym

from sdv.metadata import Metadata
from sdv.single_table import GaussianCopulaSynthesizer
from sdgym import create_single_table_synthesizer

def get_trained_synthesizer(data, metadata):
  metadata_obj = Metadata.load_from_dict(metadata)
  synthesizer = GaussianCopulaSynthesizer(metadata_obj, default_distribution='uniform')
  synthesizer.fit(data)
  return synthesizer

def sample_from_synthesizer(synthesizer, n_rows):
    return synthesizer.sample(n_rows)

GCUniformSynthesizer = create_single_table_synthesizer(
    get_trained_synthesizer_fn=get_trained_synthesizer,
    sample_from_synthesizer_fn=sample_from_synthesizer,
    display_name='GCUniform'
)

results = sdgym.benchmark_single_table(
    synthesizers=['GaussianCopulaSynthesizer'],
    custom_synthesizers=[GCUniformSynthesizer],
    sdv_datasets=['KRK_v1'],
    limit_dataset_size=False,
    timeout=20*60, # 20 min
    output_filepath='results.csv',
    detailed_results_folder='/content/results',
    sdmetrics=[]
)

This script works as expected on a terminal. But if I run it on a Colab Notebook, I see NaN values produced:
image

Additional Context

According to @frances-h: We ran into a similar issue when working on this PR.

@npatki npatki added the bug Something isn't working label Dec 11, 2024
@npatki npatki changed the title Unable to use a custom synthesize with a timeout on a Colab Notebook Unable to use a custom synthesizer with a timeout on a Colab Notebook Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant