Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce taxon constraints #3102

Merged
merged 32 commits into from
Nov 10, 2023
Merged

Enforce taxon constraints #3102

merged 32 commits into from
Nov 10, 2023

Conversation

gouttegd
Copy link
Collaborator

This PR is intended to supersede #2928. It applies cleanly to the master branch with a lean history.

It ensures that taxon constraints are enforced during the QC pipeline by:

  1. importing the version of the NCBITaxon slim that contains disjointness axioms between taxon siblings;
  2. expanding the present in taxon macro during QC;
  3. merging the full NCBITaxon slim (all taxa) during QC;
  4. fixing all known existing violations of the taxon constraints, save for two that have an external cause and will need to be fixed upstream.

gouttegd and others added 24 commits October 27, 2023 08:23
Import the version of the NCBITaxon slim that includes disjointness
axioms over taxon siblings. Those axioms are needed for the taxon
constraint check to work.
We make sure that taxon constraints are checked as part of the QC
pipeline (during the "bridge checks"). This is done by two steps.

1. We expand the RO:0002175 macros present in the ontology. This must
   only be done as part of the QC pipeline to avoid bloating the
  released artefacts with the "witness classes" created by the expansion
  of this macro.

2. We merge in the NCBITaxon slim with disjointness axioms. Not just an
   extracted module (such an extracted module has already been been
   imported as part of the normal imports pipeline), but the entire
   slim. Again, this must only be done in the QC pipeline because we do
   not want the entire NCBITaxon slim to be included in any relaese
   artefact, not even -full.
'suspensorium' is explicitly constrained to be 'NOT in mammals',
but pterygoid bone has no such restriction and is apparently found (at
least) in mice, so presumably pterygoid bone should not be assumed to be
part of something that does not exist in mammals.
The existence of a corpus luteum is documented in some jawless
vertebrates, so it cannot be said to only exist in mammals.
The pericardial cavity, like all terms related to 'heart', is strictly
defined as vertebrate-specific in Uberon. Ciona do have something that
is akin to a pericardium, but since they are not vertebrates, we should
either:

* broaden our heart-related terms (at least 'pericardium', and all terms
  connected to it) so that it is no longer vertebrate-specific;
* create another term to represent pericardium-like structure in
  non-vertebrates.

Until a decision is made between these two approaches, here we simply
remove the statement that the pericardial cavity exists in Ciona, since
it directly violates the vertebrate-specific constraint.
Some hagfish may have vertebral elements without having a full vertebral
column, so the existence of vertebral elements should not imply the
existence of a vertebral column.
The philtrum (UBERON:0005402) exists in species that do not have a
rhinarium (UBERON:0011256), so it should not be said to be part_of some
rhinarium.
Hagfishes are reported to "have independently evolved a highly laminated
cerebral cortex, comparable in many ways to the cerebral cortex of
mammals". So we relax the taxon constraint on 'neocortex' from mammals
up to vertebrates, so as to cover fishes.
The mapping between UBERON:0006334 and HBA:4413 is most likely bogus.
The Uberon term refers to a 'posterior lateral line' that is not
supposed to exist in amniotes, whereas the HBA term refers to a
'lateral nucleus of the pulvinar, left', a regional part of the brain.
'axial skeletal system' is intended to apply to chordates, but is said
to have as part some 'axial skeleton plus cranial skeleton', which is
vertebrate-specific. We remove the offending has_part axiom.
'dermatocranium' is taxon-constrained to jawed vertebrates, but is part
of the definition of 'nose' (through 'nasal skeleton') which exists
throughout vertebrates. So we relax the taxon constraint on
'dermatocranium' up to vertebrates as well.
AEO:0000154 is mapped to both UBERON:0036215 (anatomical surface region)
and UBERON:0002416 (integumental system), making the two Uberon terms
equivalent with each other upon merging the AEO bridge. Based on the
definition of the AEO term, the mapping with integumental system should
be the correct one; the mapping with 'anatomical surface region' had
probably been created on the basis on the lexical similarity between
'organism surface' and 'anatomical surface' (yet another example of how
lexical similarity matching is harmful).
'epithelium' is restricted to Eumatazoa, but "true epithelia" have been
reported in some sponges. Besides, the homology notes on 'epithelium'
clearly states that epithelial tissues are found in *metazoans*, not
*eumetazoans*. So we relax the taxon constraint up to metazoans.
'choroidal guanine tapetum' is restricted to elasmobranchs, but that
structure seems to exist throughout cartilaginous fishes, so we relax
the taxon constraint accordingly.
'skull' is said to be 'present in taxon' some Myxinidae. But Myxinidae
are jawless vertebrates while skull, as defined in Uberon, refers to the
association of a cranium with a mandible. Without a mandible, Myxinidae
have a cranium but no skull. So we move the 'present in taxon' some
Myxinidae axiom from 'skull' to 'cranium'.
'mouth' is said to overlap with 'respiratory system', but the mouth
exists throughout the eumatozoans while the respiratory system is
restricted to Bilateria. We remove the link to avoid restricting the
mouth to Bilateria as well.
The 'olivary pretectal nucleus' has been reported to exist in
salamanders, so we relax its taxon constraint from Amniota to Tetrapoda,
as as to cover both Amniota and Amphibia.
An arthropod structure quite obviously cannot be part of an
insect-specific structure, so we remove that link.
The 'cranium' exists throughout all craniates, so it should not be said
to be part of the 'skull', which only exists in jawed vertebrates.
…ton'.

The nose exists throughout vertebrates, but making the 'nasal skeleton'
part of the 'facial skeleton' makes it dependent of the 'skull', which
is specific to jawed vertebrates. So we break that link.
Homology notes on 'mouth' suggests the mouth evolved before the head and
is therefore present in taxa where the head is not, so 'mouth' should
not be dependent on 'head'.
@gouttegd gouttegd self-assigned this Oct 27, 2023
@gouttegd
Copy link
Collaborator Author

gouttegd commented Oct 27, 2023

The two remaining taxon constraint violations are caused by:

Funnily enough, those two issues are inverted mirrors of each other. The mouse-specific resource is making a bogus assertion by mapping to the primate-specific term while the human-specific resource is making a bogus assertion on the rodent-specific term.

The taxslim-disjoint-over-in-taxon.owl subset of NCBITaxon contains
*only* the disjointness axioms and nothing else, so we cannot import
just that subset. We need to import both the "normal" taxslim.owl subset
and the "disjoints" subset.
@gouttegd
Copy link
Collaborator Author

I guess the best strategy is cautious refresh and retain the right to
override?

I think so. And by “cautious refresh”, I mean something like “no externally maintained component should ever be automatically updated as part of either the release process or the QC process.” They should only be updated upon manual, explicit action from an editor.

No update at release time because during a release is not the time to fix any sudden problem caused by an external source. And no update at QC time because the point of the automated QC checks is to catch internal problems (it’s to prevent Uberon editors from inadvertently introducing errors in a PR), they should not be caused to fail for something that happened elsewhere.

This is already what we do for imports: they are only refreshed when an editor explicitly runs make refresh-imports, not at release time or QC time. We should generalise that for other externally maintained components, not just imports.

Copy link
Contributor

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not reviewed uberon-edit.obo (not the right expertise, but FWIW lgtm).

My main concern is this line in the Makefile:

https://github.com/obophenotype/uberon/blob/dc3aa6a28753639eb2b3efe9e433c02ede6c628e/src/ontology/Makefile#L534C1-L537C1

Are we certain we want to bump merged.owl with the entire ncbitaxondisjoints.owl file? Wont that make memory consumption go wild?

src/ontology/uberon-odk.yaml Show resolved Hide resolved
src/ontology/uberon.Makefile Show resolved Hide resolved
matentzn
matentzn previously approved these changes Nov 1, 2023
Copy link
Contributor

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an approval MINUS changes to uberon-edit.obo.

@gouttegd
Copy link
Collaborator Author

gouttegd commented Nov 1, 2023

FYI, all the changes in uberon-edit have been explained beforehand over the past month as part of the discussion in Anita’s previous PR (#2928; see all comments starting from this one). There has been no outright objection to them.

There are a couple of cases where @cmungall suggested an approach that I have, for now, not followed:

This is an example we see a bit where we have a choice between 1. Losing a
useful inference in many species for the sake of one; 2. Making an awkward
class for the one species and leaving the others alone

We often do 1 when we should do 2. In this case the element would be
curiously floating with no part parent

I opted for the first option in those cases, because it keeps the PR simple. I’d rather have option 2 discussed in separate, dedicated issues and then implemented in separate, dedicated PRs, which we can do after this one has been merged.

If the online QC check fails, check if the failure is due to violations
of the taxon constraints and if it is, post a comment to the PR with the
reasoner's explanations for the violations.
Copy link

github-actions bot commented Nov 2, 2023

This PR violates some taxon constraints. Here is what the reasoner has to say:

http://purl.obolibrary.org/obo/UBERON_8440041 in taxon http://purl.obolibrary.org/obo/NCBITaxon_9606 SubClassOf Nothing

Axiom Impact

Axioms used 1 times

Ontologies used:

matentzn
matentzn previously approved these changes Nov 2, 2023
Copy link
Contributor

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you vetted the action provider a bit, else fantastic

@gouttegd
Copy link
Collaborator Author

gouttegd commented Nov 2, 2023

@matentzn Nothing fishy in the comment-pr. :) That’s the same action that has already been used by @shawntanzk’s “diff-on-demand” (aka “gogoeditdiff”) action.

@gouttegd
Copy link
Collaborator Author

gouttegd commented Nov 2, 2023

Given that the last blocker for this PR are the two TC violations caused by errors in externally maintained resources, if by next week there has been no sign of progress on the corresponding upstream tickets (here and here), I vote for forcibly removing the offending axioms in the local copies of these resources (src/ontology/bridge/uberon-bridge-to-dmba.owl and src/ontology/components/hra_subset.owl) and merge.

@matentzn
Copy link
Contributor

matentzn commented Nov 2, 2023

Given that the last blocker for this PR are the two TC violations caused by errors in externally maintained resources, if by next week thttps://github.com/obophenotype/ABA_Uberon/issues/36 has been no sign of progress on the corresponding upstream tickets (hubmapconsortium/ccf-validation-tools#293 and here), I vote for forcibly removing the offending axioms in the local copies of these resources (src/ontology/bridge/uberon-bridge-to-dmba.owl and src/ontology/components/hra_subset.owl) and merge.

I agree!

Remove the mapping between UBERON:0000435 and DMBA:16271 from the DMBA
bridge. Remove from the hra_subset component the assertion that
UBERON:8440041 is present_in_taxon some NCBITaxon:9606.

The mapping and the assertion are bogus and violate the taxon
constraints. They need to be fixed upstream (both the DMBA bridge and
the HRA subset are maintained elsewhere), but until that happens we
remove them from our local copies.
When MIR is set to false, we should not attempt to update the
`components/hra_subset.owl` file.
@gouttegd
Copy link
Collaborator Author

gouttegd commented Nov 8, 2023

I’ve removed the last two offending assertions from our local copy of the externally maintained uberon-bridge-to-dmba and hra_subset resources as discussed above. This is now ready.

@gouttegd gouttegd requested a review from matentzn November 9, 2023 09:01
@anitacaron
Copy link
Collaborator

Sorry, I'll remove the offending axiom now and need to update today's HRA subset for the last time. It won't have any other update before next year.

Copy link
Contributor

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Just had some time to check it again - not test, but sanity check. Happy with this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants