Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating Detached RO-Crates #206

Open
dnlbauer opened this issue Nov 29, 2024 · 3 comments
Open

Creating Detached RO-Crates #206

dnlbauer opened this issue Nov 29, 2024 · 3 comments
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@dnlbauer
Copy link

dnlbauer commented Nov 29, 2024

I am trying to use the library to create a Detatched RO-Crate. For a minimal example, I tried to create something that contains no files; I assume a minimal jsonld to look similar to this (obviously not valid since it has no license etc, but serves as an example):

{
    "@context": "https://w3id.org/ro/crate/1.1/context",
    "@graph": [
        {
		    "@id": "https://example.org/ro-crate-metadata.json",
		    "@type": "CreativeWork",
		    "about": { "@id": "https://example.com/" },
		    "conformsTo": { "@id": "https://w3id.org/ro/crate/1.1" }
        },
        {
            "@id": "https://example.com/",
            "@type": "Dataset",
            (...)
        }
    ] 
}

When the crate object is created from scratch with ROCrate(), the id of the metadata and root dataset entity are already set to default values ro-crate-metadata.json and ./. Because the ids are not mutable, the only way to set appropriate ids that i found was to recreate the entities:

from rocrate.rocrate import ROCrate
from rocrate.model import RootDataset, Metadata

crate = ROCrate()
crate.add(RootDataset(crate, "https://example.com/"))
crate.add(Metadata(crate, "https://example.com/ro-crate-metadata.json"))

However, I noticed some problems with this approach:

Issue 1

The medata entity about property does not get updated to the new root dataset entity. this can be fixed by adding an about property when creating the Metadata entity, but I think a better approach would be to have the library handle this internally for the two must-have entities RootDataset and Metadata.

Issue 2

The original RootDataset and Metadata entity are not actually replaced in the ROCrate. Instead, the new entities are added on top of them. The reason is that the old and new entities do not resolve to the same hash value, leading to the old entities not being evicted from the internal map. The resulting jsonld therefore looks like this even after setting the about manually:

{
 "@context": "https://w3id.org/ro/crate/1.1/context",
 "@graph": [
 {
   "@id": "./",
   "@type": "Dataset",
   "datePublished": "2024-11-29T14:39:49+00:00"
  },
  {
    "@id": "ro-crate-metadata.json",
    "@type": "CreativeWork",
    "about": {"@id": "./"},
    "conformsTo": {"@id": "[https://w3id.org/ro/crate/1.1](https://w3id.org/ro/crate/1.1)"}
  },
  {
    "@id": "https://example.com/",
    "@type": "Dataset",
    "datePublished": "2024-11-29T14:39:49+00:00"
  },
  {
    "@id": "https://example.com/ro-crate-metadata.json",
    "@type": "CreativeWork",
    "about": {"@id": "https://example.com/"},
    "conformsTo": {"@id": "[https://w3id.org/ro/crate/1.1](https://w3id.org/ro/crate/1.1)"}
  }
]}

I consider this as a bug since there is actual code inside the add method to work with RootDataset and Metadata, but it fails to override existing entities as intended.

Unless I am approaching detached RO-Crates totally wrong, the only way I found to build a valid detached RO-Crate was to delete the remaining additional entities manually. This led to this "hacky" approach:

crate = ROCrate()

# get a reference to the old root and metadata entity
entities_to_delete = [crate.root_dataset, crate.metadata]

# replace root and metadata entity with entities that have the correct identifier
crate.add(RootDataset(crate, "https://example.com/"))
crate.add(Metadata(crate, "https://example.com/ro-crate-metadata.json", properties={"about": crate.root_dataset}))

# delete old entities
crate.delete(*entities_to_delete)

# generate json for detached crate
pprint(crate.metadata.generate())

Fixing the two issues would make it way easier to work with Detached RO-Crates in general.

On a side note: It would be nice to have a way to instantiate the RO-Crate with correct ids directly, but I think its fine the way it is if some documentation is added about how to build a detached crate, so not everyone has to figure all of the above out on his/her own.

@dnlbauer dnlbauer added bug Something isn't working documentation Improvements or additions to documentation labels Nov 29, 2024
@simleo simleo added enhancement New feature or request and removed bug Something isn't working labels Dec 5, 2024
@simleo
Copy link
Collaborator

simleo commented Dec 5, 2024

Detached crates are a new feature in RO-Crate 1.2, which has not been released (nor finalized) yet. The library does not support them, so I have changed the "bug" label to "enhancement". Moreover, your example is not a valid 1.2 detached RO-Crate because the metadata descriptor must have an @id of ro-crate-metadata.json, see RO-Crate Metadata Descriptor and ResearchObject/ro-crate#365.

After RO-Crate 1.2 is released, we'll need to make changes to the library to support root data entities with an @id different from ./, more specifically an absolute URI. This is actually already supported when reading an existing RO-Crate, but not when creating a new one, as you have noticed.

@dnlbauer
Copy link
Author

dnlbauer commented Dec 5, 2024

@simleo thanks for pointing the @id out. I learn something new every day 👍

Regarding the bug label: I was mainly concerned about the fact that when adding a new RootDataset entity, depending on its @id, the ROCrate.add method is leading to a crate where the root dataset was not actually treated as a root entity, which is quite misleading.
This is not limited to detached RO-Crates since having ./ as @id of the root dataset is only a SHOULD. So maybe it should be a separate github issue...

It also manifests for example with this snippet:

crate = ROCrate()
root_dataset = crate.add(RootDataset(crate, "folder/"))
alice = crate.add(Person(crate, "https://example.com/alice", properties={
    "name": "Alice Doe",
    "affiliation": "University of Flatland"
}))
root_dataset["author"] = alice

# The resulting metadata will not have folder/ as root dataset
# Also, alice will not be the crate author.
pprint(crate.metadata.generate())
{"@context": "https://w3id.org/ro/crate/1.1/context",
 "@graph": [{"@id": "./",
             "@type": "Dataset",
             "datePublished": "2024-12-05T14:43:04+00:00"},
            {"@id": "ro-crate-metadata.json",
             "@type": "CreativeWork",
             "about": {"@id": "./"},
             "conformsTo": {"@id": "https://w3id.org/ro/crate/1.1"}},
            {"@id": "#114399a7-4ce3-4f9a-b46c-9f3328dff365",
             "@type": "Dataset",
             "author": {"@id": "https://example.com/alice"},
             "datePublished": "2024-12-05T14:43:04+00:00"},
            {"@id": "https://example.com/alice",
             "@type": "Person",
             "affiliation": "University of Flatland",
             "name": "Alice Doe"}]}

As a user, I would expect the library to either handling this correctly especially since the add method has specific code to treat the replacement of RootDataset; or it should preventing me from doing it.

@simleo
Copy link
Collaborator

simleo commented Dec 10, 2024

@dnlbauer your snippet exposed a bug that affected all datasets. I fixed that in #208. Due to that bug, the root dataset that you created had an auto-generated @id of #114399a7-4ce3-4f9a-b46c-9f3328dff365 instead of folder/. Now it's possible to make things consistent by adding a few statements (change the metadata entity's about and delete the old root dataset) to your snippet:

from pprint import pprint
from rocrate.rocrate import ROCrate
from rocrate.model import RootDataset, Person

crate = ROCrate()
root_dataset = crate.add(RootDataset(crate, "folder/"))
crate.metadata["about"] = root_dataset
crate.delete("./")
alice = crate.add(Person(crate, "https://example.com/alice", properties={
    "name": "Alice Doe",
    "affiliation": "University of Flatland"
}))
root_dataset["author"] = alice
pprint(crate.metadata.generate())

Which results in:

{'@context': 'https://w3id.org/ro/crate/1.1/context',
 '@graph': [{'@id': 'ro-crate-metadata.json',
             '@type': 'CreativeWork',
             'about': {'@id': 'folder/'},
             'conformsTo': {'@id': 'https://w3id.org/ro/crate/1.1'}},
            {'@id': 'folder/',
             '@type': 'Dataset',
             'author': {'@id': 'https://example.com/alice'},
             'datePublished': '2024-12-10T11:26:12+00:00'},
            {'@id': 'https://example.com/alice',
             '@type': 'Person',
             'affiliation': 'University of Flatland',
             'name': 'Alice Doe'}]}

If data entities are added before creating the new root dataset, however, you also have to take care of the hasPart property. I think the library was developed assuming that the root dataset would not change while the crate is being built. The way add works, however, can lead to uncommon scenarios like this one. Some things will have to change, especially in order to support RO-Crate 1.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants