Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add tissue_type and new validation rules for tissue_ontology_term_id and cell_type_ontology_term_id #623

Merged
merged 8 commits into from
Sep 20, 2023

Conversation

nayib-jose-gloria
Copy link
Contributor

#514
#517

Changes:

  • added categorical field 'tissue_type' as required curator-annotated field, enforcing value as one of either 'organoid', 'cell culture', or 'tissue'
  • enforced new cell_type_ontology_term_id rules (i.e. MUST be CL term but must NOT be one of the listed forbidden terms)
  • enforced new tissue_type_ontology_term_id rules
    - do NOT accept suffixes in term_ids and do not append them to the resulting label
    - enforce cell_type_ontology_term_id rules on this term_id IF tissue_type is 'cell culture',
    - enforce term_id is child term of UBERON:0001062 if tissue_type is 'organoid' or 'tissue'
  • simplified parts of code that enforced term_id suffix logic (in write-labels and validate modules), as they no longer apply to any terms and we have no plans to support them again
  • removed validation of schema definition in 'validate' command (this is already handled by unit tests, we shouldn't expose code validation logic into a user / deployment facing function)
  • fixed tests enforcing schema definition (were not being run previously due to a bad if check) and updated them to fit current schema definition structure
  • added tests + update h5ad fixtures to contain tissue_type column w/ appropriate values

@codecov
Copy link

codecov bot commented Sep 18, 2023

Codecov Report

Merging #623 (fb5384e) into main (1fba1b6) will decrease coverage by 0.08%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #623      +/-   ##
==========================================
- Coverage   83.08%   83.01%   -0.08%     
==========================================
  Files          19       19              
  Lines        1709     1684      -25     
==========================================
- Hits         1420     1398      -22     
+ Misses        289      286       -3     
Flag Coverage Δ
unittests 83.01% <100.00%> (-0.08%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
cellxgene_schema_cli/cellxgene_schema/__init__.py 100.00% <100.00%> (ø)
cellxgene_schema_cli/cellxgene_schema/validate.py 93.73% <100.00%> (+0.13%) ⬆️
...lxgene_schema_cli/cellxgene_schema/write_labels.py 94.77% <100.00%> (+0.46%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@@ -281,16 +281,31 @@ def test_assay_ontology_term_id(self):

def test_cell_type_ontology_term_id(self):
"""
cell_type_ontology_term_id categorical with str categories. This MUST be a CL term.
cell_type_ontology_term_id categorical with str categories. This MUST be a CL term, and must NOT match forbidden
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a locally glossary somewhere for these acronyms(CL)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think its an acronym, its a prefix used for cell ontology terms. There's a glossary in the user-facing schema documentation

)
with self.subTest(forbidden_term="EFO:0000001"):
self.validator.adata.obs.loc[
self.validator.adata.obs.index[0], "cell_type_ontology_term_id"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the benefit of using the numpy method of slicing foo[ x, y], as opposed to pythons way using foo[x][y]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the syntax for the loc accessor function for pandas dataframes

Copy link
Contributor Author

@nayib-jose-gloria nayib-jose-gloria Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is the standard way to reassign a pandas df entry but perhaps both ways work

@nayib-jose-gloria nayib-jose-gloria merged commit 6cef195 into main Sep 20, 2023
7 of 8 checks passed
@nayib-jose-gloria nayib-jose-gloria deleted the nayib/add-tissue-type branch September 20, 2023 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants