-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enforce taxon constraints #3102
Conversation
Import the version of the NCBITaxon slim that includes disjointness axioms over taxon siblings. Those axioms are needed for the taxon constraint check to work.
We make sure that taxon constraints are checked as part of the QC pipeline (during the "bridge checks"). This is done by two steps. 1. We expand the RO:0002175 macros present in the ontology. This must only be done as part of the QC pipeline to avoid bloating the released artefacts with the "witness classes" created by the expansion of this macro. 2. We merge in the NCBITaxon slim with disjointness axioms. Not just an extracted module (such an extracted module has already been been imported as part of the normal imports pipeline), but the entire slim. Again, this must only be done in the QC pipeline because we do not want the entire NCBITaxon slim to be included in any relaese artefact, not even -full.
'suspensorium' is explicitly constrained to be 'NOT in mammals', but pterygoid bone has no such restriction and is apparently found (at least) in mice, so presumably pterygoid bone should not be assumed to be part of something that does not exist in mammals.
The existence of a corpus luteum is documented in some jawless vertebrates, so it cannot be said to only exist in mammals.
The pericardial cavity, like all terms related to 'heart', is strictly defined as vertebrate-specific in Uberon. Ciona do have something that is akin to a pericardium, but since they are not vertebrates, we should either: * broaden our heart-related terms (at least 'pericardium', and all terms connected to it) so that it is no longer vertebrate-specific; * create another term to represent pericardium-like structure in non-vertebrates. Until a decision is made between these two approaches, here we simply remove the statement that the pericardial cavity exists in Ciona, since it directly violates the vertebrate-specific constraint.
Some hagfish may have vertebral elements without having a full vertebral column, so the existence of vertebral elements should not imply the existence of a vertebral column.
The philtrum (UBERON:0005402) exists in species that do not have a rhinarium (UBERON:0011256), so it should not be said to be part_of some rhinarium.
Hagfishes are reported to "have independently evolved a highly laminated cerebral cortex, comparable in many ways to the cerebral cortex of mammals". So we relax the taxon constraint on 'neocortex' from mammals up to vertebrates, so as to cover fishes.
The mapping between UBERON:0006334 and HBA:4413 is most likely bogus. The Uberon term refers to a 'posterior lateral line' that is not supposed to exist in amniotes, whereas the HBA term refers to a 'lateral nucleus of the pulvinar, left', a regional part of the brain.
'axial skeletal system' is intended to apply to chordates, but is said to have as part some 'axial skeleton plus cranial skeleton', which is vertebrate-specific. We remove the offending has_part axiom.
'dermatocranium' is taxon-constrained to jawed vertebrates, but is part of the definition of 'nose' (through 'nasal skeleton') which exists throughout vertebrates. So we relax the taxon constraint on 'dermatocranium' up to vertebrates as well.
AEO:0000154 is mapped to both UBERON:0036215 (anatomical surface region) and UBERON:0002416 (integumental system), making the two Uberon terms equivalent with each other upon merging the AEO bridge. Based on the definition of the AEO term, the mapping with integumental system should be the correct one; the mapping with 'anatomical surface region' had probably been created on the basis on the lexical similarity between 'organism surface' and 'anatomical surface' (yet another example of how lexical similarity matching is harmful).
'epithelium' is restricted to Eumatazoa, but "true epithelia" have been reported in some sponges. Besides, the homology notes on 'epithelium' clearly states that epithelial tissues are found in *metazoans*, not *eumetazoans*. So we relax the taxon constraint up to metazoans.
'choroidal guanine tapetum' is restricted to elasmobranchs, but that structure seems to exist throughout cartilaginous fishes, so we relax the taxon constraint accordingly.
'skull' is said to be 'present in taxon' some Myxinidae. But Myxinidae are jawless vertebrates while skull, as defined in Uberon, refers to the association of a cranium with a mandible. Without a mandible, Myxinidae have a cranium but no skull. So we move the 'present in taxon' some Myxinidae axiom from 'skull' to 'cranium'.
'mouth' is said to overlap with 'respiratory system', but the mouth exists throughout the eumatozoans while the respiratory system is restricted to Bilateria. We remove the link to avoid restricting the mouth to Bilateria as well.
The 'olivary pretectal nucleus' has been reported to exist in salamanders, so we relax its taxon constraint from Amniota to Tetrapoda, as as to cover both Amniota and Amphibia.
An arthropod structure quite obviously cannot be part of an insect-specific structure, so we remove that link.
The 'cranium' exists throughout all craniates, so it should not be said to be part of the 'skull', which only exists in jawed vertebrates.
…ton'. The nose exists throughout vertebrates, but making the 'nasal skeleton' part of the 'facial skeleton' makes it dependent of the 'skull', which is specific to jawed vertebrates. So we break that link.
Homology notes on 'mouth' suggests the mouth evolved before the head and is therefore present in taxa where the head is not, so 'mouth' should not be dependent on 'head'.
The two remaining taxon constraint violations are caused by:
Funnily enough, those two issues are inverted mirrors of each other. The mouse-specific resource is making a bogus assertion by mapping to the primate-specific term while the human-specific resource is making a bogus assertion on the rodent-specific term. |
The taxslim-disjoint-over-in-taxon.owl subset of NCBITaxon contains *only* the disjointness axioms and nothing else, so we cannot import just that subset. We need to import both the "normal" taxslim.owl subset and the "disjoints" subset.
I think so. And by “cautious refresh”, I mean something like “no externally maintained component should ever be automatically updated as part of either the release process or the QC process.” They should only be updated upon manual, explicit action from an editor. No update at release time because during a release is not the time to fix any sudden problem caused by an external source. And no update at QC time because the point of the automated QC checks is to catch internal problems (it’s to prevent Uberon editors from inadvertently introducing errors in a PR), they should not be caused to fail for something that happened elsewhere. This is already what we do for imports: they are only refreshed when an editor explicitly runs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not reviewed uberon-edit.obo (not the right expertise, but FWIW lgtm).
My main concern is this line in the Makefile
:
Are we certain we want to bump merged.owl with the entire ncbitaxondisjoints.owl file? Wont that make memory consumption go wild?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an approval MINUS changes to uberon-edit.obo.
- Ensure that @anitacaron reviews these
FYI, all the changes in uberon-edit have been explained beforehand over the past month as part of the discussion in Anita’s previous PR (#2928; see all comments starting from this one). There has been no outright objection to them. There are a couple of cases where @cmungall suggested an approach that I have, for now, not followed:
I opted for the first option in those cases, because it keeps the PR simple. I’d rather have option 2 discussed in separate, dedicated issues and then implemented in separate, dedicated PRs, which we can do after this one has been merged. |
If the online QC check fails, check if the failure is due to violations of the taxon constraints and if it is, post a comment to the PR with the reasoner's explanations for the violations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you vetted the action provider a bit, else fantastic
@matentzn Nothing fishy in the comment-pr. :) That’s the same action that has already been used by @shawntanzk’s “diff-on-demand” (aka “gogoeditdiff”) action. |
Given that the last blocker for this PR are the two TC violations caused by errors in externally maintained resources, if by next week there has been no sign of progress on the corresponding upstream tickets (here and here), I vote for forcibly removing the offending axioms in the local copies of these resources ( |
I agree! |
Remove the mapping between UBERON:0000435 and DMBA:16271 from the DMBA bridge. Remove from the hra_subset component the assertion that UBERON:8440041 is present_in_taxon some NCBITaxon:9606. The mapping and the assertion are bogus and violate the taxon constraints. They need to be fixed upstream (both the DMBA bridge and the HRA subset are maintained elsewhere), but until that happens we remove them from our local copies.
When MIR is set to false, we should not attempt to update the `components/hra_subset.owl` file.
I’ve removed the last two offending assertions from our local copy of the externally maintained |
Sorry, I'll remove the offending axiom now and need to update today's HRA subset for the last time. It won't have any other update before next year. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Just had some time to check it again - not test, but sanity check. Happy with this!
This PR is intended to supersede #2928. It applies cleanly to the master branch with a lean history.
It ensures that taxon constraints are enforced during the QC pipeline by: