Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't download datasets if .aws config is present #238

Open
pvk-developer opened this issue May 29, 2023 · 0 comments
Open

Can't download datasets if .aws config is present #238

pvk-developer opened this issue May 29, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@pvk-developer
Copy link
Member

Environment Details

Please indicate the following details about the environment in which you found the bug:

  • SDGym version: 0.6.1
  • Python version: Any
  • Operating System: MacOS / Unix / Ubuntu

Error Description

When running on your local environment and it happens to have .aws/ folder with some configuration in it for your AWS, you end up getting the following error:

ClientError: An error occurred (InvalidAccessKeyId) when calling the GetObject operation: The AWS Access Key Id you provided does not exist in our records.

Steps to reproduce

In order to reproduce the steps create a .aws folder in your home: mkdir ~/.aws then create a file called credentials and add:

[default]
aws_access_key_id = <your id>
aws_secret_access_key = <your access key>

PS: In order for this to work make sure that you have cleared the cache of the downloaded datasets.

import sdgym

In [4]: sdgym.benchmark_single_table(synthesizers=['GaussianCopulaSynthesizer'], sdv_datasets=['student_plac
   ...: ements'], timeout=22)
---------------------------------------------------------------------------
ClientError                               Traceback (most recent call last)
Cell In[4], line 1
----> 1 sdgym.benchmark_single_table(synthesizers=['GaussianCopulaSynthesizer'], sdv_datasets=['student_placements'], timeout=22)

File ~/Projects/sdv-dev/SDGym/sdgym/benchmark.py:507, in benchmark_single_table(synthesizers, custom_synthesizers, sdv_datasets, additional_datasets_folder, limit_dataset_size, compute_quality_score, sdmetrics, timeout, output_filepath, detailed_results_folder, show_progress, multi_processing_config)
    503 _validate_inputs(output_filepath, detailed_results_folder, synthesizers, custom_synthesizers)
    505 _create_detailed_results_directory(detailed_results_folder)
--> 507 job_args_list = _generate_job_args_list(
    508     limit_dataset_size, sdv_datasets, additional_datasets_folder, sdmetrics,
    509     detailed_results_folder, timeout, compute_quality_score, synthesizers, custom_synthesizers)
    511 scores = _run_jobs(multi_processing_config, job_args_list, show_progress)
    512 if output_filepath:

File ~/Projects/sdv-dev/SDGym/sdgym/benchmark.py:90, in _generate_job_args_list(limit_dataset_size, sdv_datasets, additional_datasets_folder, sdmetrics, detailed_results_folder, timeout, compute_quality_score, synthesizers, custom_synthesizers)
     88 datasets = []
     89 if sdv_datasets is not None:
---> 90     datasets = get_dataset_paths(sdv_datasets, None, None, None, None)
     92 if additional_datasets_folder:
     93     additional_datasets = get_dataset_paths(None, None, additional_datasets_folder, None, None)

File ~/Projects/sdv-dev/SDGym/sdgym/datasets.py:200, in get_dataset_paths(datasets, datasets_path, bucket, aws_key, aws_secret)
    196     else:
    197         datasets = _get_available_datasets(
    198             'single_table', bucket=bucket)['dataset_name'].tolist()
--> 200 return [
    201     _get_dataset_path('single_table', dataset, datasets_path, bucket, aws_key, aws_secret)
    202     for dataset in datasets
    203 ]

File ~/Projects/sdv-dev/SDGym/sdgym/datasets.py:201, in <listcomp>(.0)
    196     else:
    197         datasets = _get_available_datasets(
    198             'single_table', bucket=bucket)['dataset_name'].tolist()
    200 return [
--> 201     _get_dataset_path('single_table', dataset, datasets_path, bucket, aws_key, aws_secret)
    202     for dataset in datasets
    203 ]

File ~/Projects/sdv-dev/SDGym/sdgym/datasets.py:60, in _get_dataset_path(modality, dataset, datasets_path, bucket, aws_key, aws_secret)
     57     if local_path.exists():
     58         return local_path
---> 60 download_dataset(
     61     modality, dataset, dataset_path, bucket=bucket, aws_key=aws_key, aws_secret=aws_secret)
     62 return dataset_path

File ~/Projects/sdv-dev/SDGym/sdgym/datasets.py:36, in download_dataset(modality, dataset_name, datasets_path, bucket, aws_key, aws_secret)
     34 LOGGER.info('Downloading dataset %s from %s', dataset_name, bucket)
     35 s3 = get_s3_client(aws_key, aws_secret)
---> 36 obj = s3.get_object(Bucket=bucket_name, Key=f'{modality.upper()}/{dataset_name}.zip')
     37 bytes_io = io.BytesIO(obj['Body'].read())
     39 LOGGER.info('Extracting dataset into %s', datasets_path)

File ~/.virtualenvs/SDGym/lib/python3.8/site-packages/botocore/client.py:530, in ClientCreator._create_api_method.<locals>._api_call(self, *args, **kwargs)
    526     raise TypeError(
    527         f"{py_operation_name}() only accepts keyword arguments."
    528     )
    529 # The "self" in this scope is referring to the BaseClient.
--> 530 return self._make_api_call(operation_name, kwargs)

File ~/.virtualenvs/SDGym/lib/python3.8/site-packages/botocore/client.py:960, in BaseClient._make_api_call(self, operation_name, api_params)
    958     error_code = parsed_response.get("Error", {}).get("Code")
    959     error_class = self.exceptions.from_code(error_code)
--> 960     raise error_class(parsed_response, operation_name)
    961 else:
    962     return parsed_response

ClientError: An error occurred (InvalidAccessKeyId) when calling the GetObject operation: The AWS Access Key Id you provided does not exist in our records.
@pvk-developer pvk-developer added bug Something isn't working new Automatic label applied to new issues labels May 29, 2023
@pvk-developer pvk-developer changed the title Can't download datasets if .aws is present in your home folder Can't download datasets if .aws config is present May 29, 2023
@pvk-developer pvk-developer removed the new Automatic label applied to new issues label May 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant