Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LinkML support for future profile development #2

Open
elichad opened this issue Aug 30, 2024 · 7 comments
Open

LinkML support for future profile development #2

elichad opened this issue Aug 30, 2024 · 7 comments

Comments

@elichad
Copy link

elichad commented Aug 30, 2024

Expanding on a discussion with @simleo in the WRROC meeting.

At Manchester we've just started trying to use LinkML to write schemas for RO-Crate validation. LinkML schemas are YAML-based and therefore a lot easier for inexperienced users to comprehend and add to - and crucially, they can also be converted to SHACL. We think that it's important for RO-Crate profile developers to be able to write a validation schema for their profile themselves, and LinkML is a more approachable framework than SHACL to achieve this (as profile developers may not be linked data/RDF experts).

There has been interest and discussion around this previously: see ResearchObject/ro-crate#264 and linkml/linkml#1462

Thinking about how future profiles could be developed using LinkML in a way that's compatible with this validator package, there are a few possible approaches:

  • support LinkML schemas directly within the package (don't know if this is possible)
  • include LinkML schemas and their SHACL conversions within the repo, such that developers can update the LinkML and the SHACL conversion is automatically generated for the validation code to use
  • develop LinkML schemas in other repos, and include just the SHACL conversions in this repo

Please let me know your thoughts about what the best direction would be.

@multimeric
Copy link

multimeric commented Sep 12, 2024

The LinkML repo is published on PyPI, so could be added as a dependency. The same goes for pySHACL.

I'm not really sure of the advantage of supporting LinkML directly, since it will always be converted to SHACL and so embracing SHACL makes more sense to me. For this reason I think it would make sense to create a separate repository with RO-Crate schemas in SHACL format, then add a shacl extra to this package that pulls in pySHACL and that repo to validate against. That way it doesn't make the installation heavier for people who are using other validation standards.

If you wanted to also support LinkML then you could create another repo with the LinkML, then add a linkml extra that pulls in the linkml package and that schema. Then when the validator runs, it does the conversion to SHACL and validates it. This could be done as a second step though, so as to work in manageable chunks.

Happy to help with any of this.

@ilveroluca
Copy link
Member

Hi all,

this seems like a good proposal @elichad. We were discussing it with @kikkomep and @simleo just yesterday and we’re all in agreement that being able to use it as an alternative to SHACL could make adding support for additional profiles more approachable.

Our first impression is that the best way to start integrating LinkML support would be the second approach you suggested:

  • include LinkML schemas and their SHACL conversions within the repo, such that developers can update the LinkML and the SHACL conversion is automatically generated for the validation code to use

The “automatically” word needs some discussion though. We could have a directory within the package for the LinkML profiles, but the profiles wouldn’t be actually used at run time. Instead, we’d propose the simple solution of having the profiles converted to SHACL as part of the development or packaging process. This should make it easier to test the converted profiles and fix things as necessary before release; keeping the conversion process prior to run time should also help make the tool more robust and easier to debug. Since we discussed this yesterday, @multimeric joined the conversation and also made some points that we should discuss together.

To implement the LinkML -> SHACL conversion it looks like we can use the SHACL generator you referenced. @kikkomep ran some experiments and managed to successfully create a LinkML validation profile, convert it to SHACL with the generator and use it within rocrate-validator. The process did expose some small bugs in the internal SHACL parsing (which have been fixed) and there is an open issue with respect to how to manage severities. For the conversion, there could be either a dedicated script or subcommand that runs the conversion and lays out the resulting ttl following the directory structure used by rocrate-validator for the validation profiles. The profile.ttl file would have to be created manually (though, if we wanted, it wouldn’t be too hard to create a little script to guide the collection of the required metadata).

As I was saying, one thing that needs some careful thinking is how to attach severities (MUST, SHOULD, MAY) to the LinkML checks. A solution could be to use annotations, but that would need support from the conversion script/subcommand to parse that information out of the resulting ttl and use it to lay out the checks appropriately in the directory structure. Another alternative, still using LinkML annotations, would be implementing additional SHACL parsing in rocrate-validator to extract the severity annotations (there's already some parsing to extract metadata). We'd be happy to hear other better/simpler alternatives.

As for helping, we're happy to receive and support PR's on this issue. Let's just agree on the approach before anyone starts hacking :-)

@elichad
Copy link
Author

elichad commented Sep 12, 2024

Next steps after discussion at the Workflow Run RO-Crate meeting today:

Make a proof of concept LinkML-SHACL integration, to check that LinkML is a viable option for writing profiles:

  • write a LinkML schema for one of the profiles already covered (Workflow RO-Crate or Process Run Crate)
  • write script to convert that LinkML to SHACL and put the SHACL in the right place in the folder structure
    • in particular, figure out how to handle severity during this conversion, as currently severity is represented in the folder structure
    • it may be better to support SHACL severity directly
  • check that the generated SHACL passes all the same tests that the existing SHACL profile does

After that (assuming LinkML is shown to be viable), we'll look at adding validation for the Five Safes Crate profile with this LinkML-SHACL approach, as this would be useful for our team at Manchester.

We'll work on this on the Manchester side, I've just made a fork which we'll contribute back from: https://github.com/eScienceLab/rocrate-validator

@kikkomep
Copy link
Member

The PR #8 introduces support for the severity property in both SHACL and Python requirement checks. Specifically, SHACL requirements can directly use the SHACL sh:severity (sh := https://www.w3.org/ns/shacl#) property to define the severity of a constraint. The folder structure typically used in validation profiles — consisting of the must, should, and optional folders, which assign severity levels to the requirement checks - is still supported but not mandatory.

This feature should simplify the process of converting a LinkML specification to SHACL, as the output from the conversion process can be directly used by the validator without requiring the creation of the mentioned folder structure. From my experiments, simply annotating the LinkML slots with the sh:severity property should be sufficient to correctly assign the severity levels to each constraint. You can also use annotations to customize the name, description of a requirement check, and the corresponding error message, if needed, as shown in the following example:

Person:
    is_a: NamedThing
    description: >-
      A person....
    class_uri: schema:Person
    slots:
      - primary_email
    slot_usage:
      primary_email:
        pattern: "^\\S+@[\\S+\\.]+\\S+"
        recommended: true
        annotations:
          sh:severity: sh:Warning
          sh:name: "Primary Email Validation"
          sh:description: "This requirement checks the validity of the primary email address."
          sh:message: "The primary email address is not valid."
...

By using the LinkML-SHACL conversion tool with the --include-annotations option to include SHACL annotations in the generated SHACL files, you should obtain a SHACL shape that can be directly used by the validator:

schema1:PersonTest a sh:NodeShape ;
    rdfs:subClassOf personinfo:NamedThing ;
    sh:closed true ;
    sh:description "A person...." ;
    sh:ignoredProperties ( rdf:type ) ;
    sh:property [ 
      sh:datatype xsd:string ;
      sh:description "This requirement checks the validity of the primary email address."^^xsd:string ;
      sh:maxCount 1 ;
      sh:message "Primary email address is not valid."^^xsd:string ;
      sh:name "Primary Email Validation"^^xsd:string ;
      sh:nodeKind sh:Literal ;
      sh:order 0 ;
      sh:path schema1:email ;
      sh:pattern "^\\S+@[\\S+\\.]+\\S+" ;
      sh:severity sh:Warning 
    ],
  ...

All that remains is to place it in the appropriate folder within your validation profile.

@elichad
Copy link
Author

elichad commented Sep 19, 2024

@kikkomep amazing! Thank you for implementing this so quickly!

@multimeric
Copy link

Hi all, have there been any recent updates on LinkML implementation here?

@elichad
Copy link
Author

elichad commented Nov 11, 2024

@multimeric I'm still working on this - here's a branch on my fork where I have one check working in LinkML as a proof of concept (from Workflow RO-Crate, checking that the root dataset has mainEntity). (the tests fail because they're not set up right, but it works when run manually on individual crates from the test data)
I'm continuing to work on this. It's been slow going as I have had a lot of travel recently and not much time to put in, plus there's been a lot of fiddling and debugging as I learn the intricacies of LinkML and its SHACL conversion tool.
https://github.com/eScienceLab/rocrate-validator/tree/feat/linkml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants