Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regarding the old_location options #156

Open
JoanneBogart opened this issue Oct 15, 2024 · 2 comments
Open

Regarding the old_location options #156

JoanneBogart opened this issue Oct 15, 2024 · 2 comments
Assignees
Labels
bug Something isn't working enhancement New feature or request

Comments

@JoanneBogart
Copy link
Collaborator

  1. If old_location has not been specified, then relative_path should be required.
  2. If old_location has not been specified, it's not necessarily an error for relative_path to match the value for another, already-registered dataset. This actually can happen for GCRCatalogs datasets. The code should perhaps put out a warning but should go ahead and register the new dataset.
@JoanneBogart JoanneBogart added bug Something isn't working enhancement New feature or request labels Oct 15, 2024
@stuartmcalpine
Copy link
Collaborator

Started addressing this in #158

If old_location has not been specified, then relative_path should be required.

I have added this check in the PR (if location_type=dataregistry, as all other types dont handle data)

If old_location has not been specified, it's not necessarily an error for relative_path to match the value for another, already-registered dataset. This actually can happen for GCRCatalogs datasets. The code should perhaps put out a warning but should go ahead and register the new dataset.

It only raises this error when location_type=dataregistry, to make sure people can't over write other peoples data. I'm not sure what you mean by GCR datasets in this sense

@JoanneBogart
Copy link
Collaborator Author

JoanneBogart commented Oct 19, 2024

When registering a GCRCatalog-type catalog, the code gets as much information as it can from its config file. There are various keywords in those files which can be used to say "the dataset is in this directory". But there are also ways in the config file to narrow down exactly which files constitute the dataset. See for example
https://github.com/LSSTDESC/gcr-catalogs/blob/master/GCRCatalogs/catalog_configs/dc2_redmagic_run2.2i_dr6_wfd_v0.8.1_highdens.yaml and
https://github.com/LSSTDESC/gcr-catalogs/blob/master/GCRCatalogs/catalog_configs/dc2_redmagic_run2.2i_dr6_wfd_v0.8.1_highlum.yaml
They have the same value for the keyword catalog_root_dir, which is essentially our relative_path, but there is another keyword catalog_path_template which can be used to select a subset of the files in the directory. I thought of three different ways we could handle this, but the simplest by far is just to say that, when dataregistry is not copying files (so there is no danger of overwriting anything), we assume the user knows what they're doing and allow them to use the same relative_path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants