Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEATURE: Incompatible queries for linking artifact tracking and runs #1096

Open
ljstrnadiii opened this issue Nov 17, 2022 · 3 comments
Open
Assignees

Comments

@ljstrnadiii
Copy link

ljstrnadiii commented Nov 17, 2022

Describe the bug

This is not necessarily a bug, but an issue that does not allow me to take advantage of linked runs to datasets in project metadata and vice versa.

When you log an artifact to metadata it displays the number of runs using that artifact. You can also go to the dataset registered within a run and it shows the number of runs using that dataset. One issue is that the two views make two possibly incompatible queries to populate the query.

Reproduction

Log an artifact to project metadata:
project_meta['datasets/some/sub/dir/dataset'].track_files(...)
and then I link to the dataset to a run
run['dataset'] = project_meta['datasets/some/sub/dir/dataset'].fetch()

When you navigate to the metadata and look at runs used and click it, you query runs by datasets/some/sub/dir/dataset, which means a run has to have a key datasets/some/sub/dir/dataset, but we just call it dataset so that we don't have to click into the nested structure that is desired in our project metadata tracking. So, it is not possible to track which runs use the artifact from the metadata runs used to link to a query.

I can always make a manual query, but the link to the query is misleading.

Expected behavior

I suppose it would be hard to expect the query to know which key you link the dataset to in the run, which would make generating the query in the runs used link pretty challenging. Nonetheless, I would expect the runs used link in the dataset tracked in meta data to avoid telling me it is used by 0 runs because that would be invalid in this case.

This is somewhat user error, but also there is an implicit assumption that is not obvious to the user about the structure of which keys are used in the run to link to a tracked artifact. I can always just create a query to find all runs after copy-pasting the hash and make sure I am consistent with what I call "dataset" in the run.

To solve this, can we simply specify which key we promise to use when linking tracked artifacts in runs? Something like
project_meta['datasets/some/sub/dir/dataset'].track_files("s3://...", run_key='training_dataset') and then update the hyperlink to query with training_dataset = <hash>?

An example of linking the artifact to a run could then just be:

run['training_dataset'] = project_meta['datasets/some/sub/dir/dataset'].fetch()`

By hyperlink to query I mean the 0 runs hyperlink you see in the screen shot below
Screen Shot 2022-11-17 at 11 18 41 AM

@ljstrnadiii ljstrnadiii changed the title BUG: Incompatible queries for artifact and run linking BUG: Incompatible queries for linking artifact tracking and runs Nov 17, 2022
@Blaizzy
Copy link
Contributor

Blaizzy commented Nov 21, 2022

Hi @ljstrnadiii

Thanks for the feedback!

I definitely see your point. This behaviour is strange.

I've passed this feedback to the engineering so they can investigate this for you and I'll keep you updated.

@Blaizzy
Copy link
Contributor

Blaizzy commented Nov 23, 2022

Hey @ljstrnadiii

I just heard from the product team, and they mentioned that it's a known issue and that it's in the backlog.
But unfortunately, we don't have an ETA on the fix. Nevertheless, I'll let you know whenever we release it.

Workaround ✅

Make sure that all runs in the project and the project-level metadata have the same namespace to artifacts.

For example:
Project level to run

import neptune.new as neptune
project = neptune.init_project()
project["dataset"].track_files("path_to_files")

run = neptune.init_run()
run["dataset"] = project["dataset"].fetch()

Run to run

import neptune.new as neptune
run = neptune.init_run()
run["dataset/v1.0"].track_files(“path_to_files”)

run_2 = neptune.init_run()
run_2["dataset/v1.0"] = run["dataset/v1.0"].fetch()

@Blaizzy Blaizzy self-assigned this Nov 23, 2022
@Blaizzy Blaizzy closed this as completed Jan 30, 2023
@SiddhantSadangi SiddhantSadangi changed the title BUG: Incompatible queries for linking artifact tracking and runs FEATURE: Incompatible queries for linking artifact tracking and runs Aug 18, 2023
@SiddhantSadangi SiddhantSadangi assigned twolodzko and unassigned Blaizzy Aug 18, 2023
@twolodzko
Copy link
Contributor

@ljstrnadiii To let you know, in the coming months we would be working on improvements in the artifacts that would solve this issue, so that it behaves as you would imagine for it to behave. If you have any more comments, inputs, or questions on the artifacts, feel free to contact me at [email protected], especially since the scope of the incoming changes still clarifies, we are looking for everything that could help us to improve the experience of the users better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants