From 0392f2b0c25bd78884abbb455eee623325bd37a3 Mon Sep 17 00:00:00 2001 From: Brian Raymor Date: Fri, 22 Nov 2024 14:43:39 -0800 Subject: [PATCH] Added c. elegans (#1126) --- schema/drafts/5.2.1-experimental.md | 210 ++++++++++++++++++++-------- 1 file changed, 151 insertions(+), 59 deletions(-) diff --git a/schema/drafts/5.2.1-experimental.md b/schema/drafts/5.2.1-experimental.md index 4275dfe6..0190c884 100644 --- a/schema/drafts/5.2.1-experimental.md +++ b/schema/drafts/5.2.1-experimental.md @@ -8,7 +8,7 @@ Version: 5.2.1-experimental The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED" "MAY", and "OPTIONAL" in this document are to be interpreted as described in [BCP 14](https://tools.ietf.org/html/bcp14), [RFC2119](https://www.rfc-editor.org/rfc/rfc2119.txt), and [RFC8174](https://www.rfc-editor.org/rfc/rfc8174.txt) when, and only when, they appear in all capitals, as shown here. -This draft is limited to **additions** or **modifications** to [schema 5.2.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/5.2.0/schema.md). If a 5.2.0 reference does not appear in this document, then no schema change is required. The following **temporary** constraints for *Danio rerio* and *Drosophila melanogaster* are specified: +This draft is limited to **additions** or **modifications** to [schema 5.2.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/5.2.0/schema.md). If a 5.2.0 reference does not appear in this document, then no schema change is required. The following **temporary** constraints are specified: * The `organism_ontology_term_id` MUST be the same for all observations. * The `tissue_type` MUST be `'tissue'` for all observations. @@ -24,6 +24,8 @@ The following ontology dependencies are *pinned* for this version of the schema. | Ontology | OBO Prefix | Release | Download | |:--|:--|:--|:--| +| [C. elegans Development Ontology] | WBls | [ 2024-09-26 Wormbase WS295](https://github.com/obophenotype/c-elegans-development-ontology/blob/vWS295) | [wbls.owl] | +| [C. elegans Gross Anatomy Ontology] | WBbt | [2024-09-24 Wormbase WS295](https://github.com/obophenotype/c-elegans-gross-anatomy-ontology/blob/v2024-09-24) | [wbbt.owl] | | [Cell Ontology] | CL | [2024-08-16] | [cl.owl]| | [Drosophila Anatomy Ontology] | FBbt | [2024-08-08](https://github.com/FlyBase/drosophila-anatomy-developmental-ontology/releases/tag/v2024-08-08) | [fbbt.owl] | | [Drosophila Development Ontology] | FBdv | [2024-08-07](https://github.com/FlyBase/drosophila-developmental-ontology/releases/tag/v2024-08-07) | [fbdv.owl] | @@ -38,6 +40,11 @@ The following ontology dependencies are *pinned* for this version of the schema. | [Zebrafish Anatomy Ontology] | ZFA
ZFS | [2022-12-09] | [zfa.owl] | | | | | | +[C. elegans Development Ontology]: https://obofoundry.org/ontology/wbls.html +[wbls.owl]: https://github.com/obophenotype/c-elegans-development-ontology/blob/vWS295/wbls.owl +[C. elegans Gross Anatomy Ontology]: https://obofoundry.org/ontology/wbbt.html + +[wbbt.owl]: https://github.com/obophenotype/c-elegans-gross-anatomy-ontology/blob/v2024-09-24/wbbt.owl [Cell Ontology]: http://obofoundry.org/ontology/cl.html [2024-08-16]: https://github.com/obophenotype/cell-ontology/releases/tag/v2024-08-16 [cl.owl]: https://github.com/obophenotype/cell-ontology/releases/download/v2024-08-16/cl.owl @@ -97,8 +104,9 @@ The following gene annotation dependencies are *pinned* for this version of the | NCBITaxon:9606
for Homo sapiens | [GENCODE (Human)] | Human reference GRCh38.p14
(GENCODE v44/Ensembl 110) | [gencode.v44.primary_assembly.annotation.gtf] | | NCBITaxon:10090
for Mus musculus | [GENCODE (Mouse)] | Mouse reference GRCm39
(GENCODE vM33/Ensembl 110) | [gencode.vM33.primary_assembly.annotation.gtf] | | NCBITaxon:2697049
for SARS-CoV-2 | [ENSEMBL (COVID-19)] | SARS-CoV-2 reference (ENSEMBL assembly: ASM985889v3) | [Sars\_cov\_2.ASM985889v3.101.gtf] | -| NCBITaxon:7955
for Danio rerio | [ENSEMBL (Zebrafish)] | GRCz11.112 (Ensembl 112) | [Danio_rerio.GRCz11.112.gtf] | -| "NCBITaxon:7227"
for Drosophila melanogaster| [ENSEMBL (Fruit fly)] | BDGP6.46 (Ensembl 112) | [Drosophila_melanogaster.BDGP6.46.112.gtf] | +| "NCBITaxon:6239"
for Caenorhabditis elegans | [ENSEMBL (Caenorhabditis elegans)] | WBcel235 (GCA_000002985.3)
Ensembl 113 | [Caenorhabditis_elegans.WBcel235.113.gtf] | +| NCBITaxon:7955
for Danio rerio | [ENSEMBL (Zebrafish)] | GRCz11 (GCA_000002035.4)
Ensembl 113 | [Danio_rerio.GRCz11.113.gtf] | +| "NCBITaxon:7227"
for Drosophila melanogaster| [ENSEMBL (Fruit fly)] | BDGP6.46 (GCA_000001215.4)
Ensembl 113 | [Drosophila_melanogaster.BDGP6.46.113.gtf] | | | [ThermoFisher ERCC Spike-Ins] | ThermoFisher ERCC RNA Spike-In Control Mixes (Cat # 4456740, 4456739) | [cms_095047.txt] | [RNA Spike-In Control Mixes]: https://www.thermofisher.com/document-connect/document-connect.html?url=https%3A%2F%2Fassets.thermofisher.com%2FTFS-Assets%2FLSG%2Fmanuals%2Fcms_086340.pdf&title=VXNlciBHdWlkZTogRVJDQyBSTkEgU3Bpa2UtSW4gQ29udHJvbCBNaXhlcyAoRW5nbGlzaCAp @@ -112,11 +120,14 @@ The following gene annotation dependencies are *pinned* for this version of the [ENSEMBL (COVID-19)]: https://covid-19.ensembl.org/index.html [Sars\_cov\_2.ASM985889v3.101.gtf]: https://ftp.ensemblgenomes.org/pub/viruses/gtf/sars_cov_2/Sars_cov_2.ASM985889v3.101.gtf.gz +[ENSEMBL (Caenorhabditis elegans)]: https://useast.ensembl.org/Caenorhabditis_elegans/Info/Index +[Caenorhabditis_elegans.WBcel235.113.gtf]: https://ftp.ensembl.org/pub/release-113/gtf/caenorhabditis_elegans/Caenorhabditis_elegans.WBcel235.113.gtf.gz + [ENSEMBL (Zebrafish)]: https://useast.ensembl.org/Danio_rerio/Info/Index -[Danio_rerio.GRCz11.112.gtf]: https://ftp.ensembl.org/pub/release-112/gtf/danio_rerio/Danio_rerio.GRCz11.112.gtf.gz +[Danio_rerio.GRCz11.113.gtf]: https://ftp.ensembl.org/pub/release-113/gtf/danio_rerio/Danio_rerio.GRCz11.113.gtf.gz [ENSEMBL (Fruit fly)]: https://www.ensembl.org/Drosophila_melanogaster/Info/Index -[Drosophila_melanogaster.BDGP6.46.112.gtf]: https://ftp.ensembl.org/pub/release-112/gtf/drosophila_melanogaster/Drosophila_melanogaster.BDGP6.46.112.gtf.gz +[Drosophila_melanogaster.BDGP6.46.113.gtf]: https://ftp.ensembl.org/pub/release-113/gtf/drosophila_melanogaster/Drosophila_melanogaster.BDGP6.46.113.gtf.gz [ThermoFisher ERCC Spike-Ins]: https://www.thermofisher.com/order/catalog/product/4456740#/4456740 [cms_095047.txt]: https://assets.thermofisher.com/TFS-Assets/LSG/manuals/cms_095047.txt @@ -128,27 +139,57 @@ The following gene annotation dependencies are *pinned* for this version of the ### development_stage_ontology_term_id - - - - - - - - - - - + + + + + + + + + + +
Keydevelopment_stage_ontology_term_id
AnnotatorCurator MUST annotate.
Value - categorical with str categories. If unavailable, this MUST be "unknown".

- If organism_ontolology_term_id is "NCBITaxon:7955" for Danio rerio, then this MUST be the most accurate descendant of ZFS:0100000 for zebrafish stage and MUST NOT be ZFS:0000000 for Unknown.

If organism_ontolology_term_id is "NCBITaxon:7227" for Drosophila melanogaster, then this MUST be the most accurate FBdv term. -

Otherwise, for all other organisms this MUST be the most accurate descendant of UBERON:0000105 for life cycle stage, excluding UBERON:0000071 for death stage. -
Keydevelopment_stage_ontology_term_id
AnnotatorCurator MUST annotate.
Value + categorical with str categories. If unavailable, this MUST be "unknown".

+ + + + + + + + + + + + + + + + + + + + + +
For organism_ontolology_term_idValue
+ "NCBITaxon:6239"
for Caenorhabditis elegans +
+ MUST be the most accurate descendant of WBls:0000075
for worm life stage +
+ "NCBITaxon:7955"
for Danio rerio +
+ MUST be the most accurate descendant of ZFS:0100000
for zebrafish stage and MUST NOT be ZFS:0000000 for Unknown +
+ "NCBITaxon:7227"
for Drosophila melanogaster +
+ MUST be the most accurate FBdv term +
+

---- - ### organism_cell_type_ontology_term_id @@ -163,7 +204,15 @@ The following gene annotation dependencies are *pinned* for this version of the
Value - categorical with str categories.

+ categorical with str categories. This MUST be "unknown" when: +
    +
  • + no appropriate term can be found (e.g. the cell type is unknown) +
  • +
  • + assay_ontology_term_id is "EFO:0010961" for Visium Spatial Gene Expression, uns['spatial']['is_single'] is True, and the corresponding value of in_tissue is 0 +
  • +
@@ -172,40 +221,27 @@ The following gene annotation dependencies are *pinned* for this version of the + - + -
For organism_ontolology_term_id
- "NCBITaxon:7955"
for Danio rerio + "NCBITaxon:6239"
for Caenorhabditis elegans
- MUST be either the most accurate descendant of ZFA:0009000 for cell
or "unknown" when: -
    -
  • - no appropriate term can be found (e.g. the cell type is unknown) -
  • -
  • - assay_ontology_term_id is "EFO:0010961" for
    Visium Spatial Gene Expression, uns['spatial']['is_single'] is True,
    and the corresponding value of in_tissue is 0 -
  • -
+ MUST be the most accurate descendant of WBbt:0004017 for Cell
- "NCBITaxon:7227"
for Drosophila melanogaster + "NCBITaxon:7955"
for Danio rerio
MUST be either the most accurate descendant of FBbt:00007002 for cell
or "unknown" when: -
    -
  • - no appropriate term can be found (e.g. the cell type is unknown) -
  • -
  • - assay_ontology_term_id is "EFO:0010961" for
    Visium Spatial Gene Expression, uns['spatial']['is_single'] is True,
    and the corresponding value of in_tissue is 0 -
  • -
+
+ MUST be the most accurate descendant of ZFA:0009000 for cell
- All other values of
organism_ontology_term_id + "NCBITaxon:7227"
for Drosophila melanogaster +
MUST be the most accurate descendant of FBbt:00007002 for cell MUST be "na"
@@ -230,7 +266,12 @@ The following gene annotation dependencies are *pinned* for this version of the
Value - categorical with str categories. This MUST be a descendant of NCBITaxon:33208 for Metazoa.

If organism_ontology_term_id is "NCBITaxon:7955" for Danio rerio or "NCBITaxon:7227" for Drosophila melanogaster, then all observations MUST contain the same value. + categorical with str categories. This MUST be a descendant of NCBITaxon:33208 for Metazoa.

All observations MUST contain the same value when the organism_ontology_term_id is: +
@@ -261,6 +302,14 @@ The following gene annotation dependencies are *pinned* for this version of the + + + "NCBITaxon:6239"
for Caenorhabditis elegans + + + MUST be the most accurate descendant of WBbt:0005766 for Anatomy + + "NCBITaxon:7955"
for Danio rerio @@ -277,12 +326,6 @@ The following gene annotation dependencies are *pinned* for this version of the MUST be the most accurate descendant of FBbt:10000000 for
anatomical entity and MUST NOT be FBbt:00007002
for cell or any of its descendants. - - - All other values of
organism_ontology_term_id - - MUST be "na" - @@ -292,6 +335,27 @@ The following gene annotation dependencies are *pinned* for this version of the --- +### sex_ontology_term_id + + + + + + + + + + + + + + +
Keysex_ontology_term_id
AnnotatorCurator MUST annotate.
Valuecategorical with str categories. If unavailable, this MUST be "unknown".

If organism_ontolology_term_id is "NCBITaxon:6239" for Caenorhabditis elegans, this MUST be PATO:0000384 for male or PATO:0001340 for hermaphrodite.

Otherwise, this MUST be a descendant of PATO:0001894 for phenotypic sex. +
+
+ +--- + ### tissue_type @@ -306,12 +370,18 @@ The following gene annotation dependencies are *pinned* for this version of the
Value - categorical with str categories.

If organism_ontology_term_id is "NCBITaxon:7955" for Danio rerio or "NCBITaxon:7227" for Drosophila melanogaster, then the value MUST be "tissue".

Otherwise, the value MUST be "tissue", "organoid", or "cell culture". + categorical with str categories.

The value MUST be "tissue" when the organism_ontology_term_id is: + Otherwise, the value MUST be "tissue", "organoid", or "cell culture".

+ --- ## var and raw.var (Gene Metadata) @@ -355,6 +425,12 @@ The following gene annotation dependencies are *pinned* for this version of the "NCBITaxon:2697049" + + Caenorhabditis elegans + + "NCBITaxon:6293" + + Danio rerio @@ -388,18 +464,34 @@ The following gene annotation dependencies are *pinned* for this version of the * General Requirements * Updated requirements for supported organisms * Required Ontologies + * Added C. elegans Development Ontology (WBls) release 2024-09-26 Wormbase WS295 + * Added C. elegans Gross Anatomy Ontology (WBbt) release 2024-09-24 Wormbase WS295 * Added Drosophila Anatomy Ontology (FBbt) release 2024-08-08 * Added Drosophila Development Ontology (FBdv) release 2024-08-07 * Added Zebrafish Anatomy Ontology (ZFA+ZFS) release 2022-12-09 * Required Gene Annotations * Refactored table to include NCBI Taxon for supported organisms - * Added *Danio rerio* Reference GRCz11.112 (Ensembl 112) - * Added *Drosophila melanogaster* Reference BDGP6.46 (Ensembl 112) + * Added *Caenorhabditis elegans* WBcel235 (GCA_000002985.3) Ensembl 113 + * Added *Danio rerio* GRCz11 (GCA_000002035.4) Ensembl 113 + * Added *Drosophila melanogaster* BDGP6.46 (GCA_000001215.4) Ensembl 113 * obs (Cell metadata) - * Updated `development_stage_ontology_term_id` for *Danio rerio* and *Drosophila melanogaster* + * Updated `development_stage_ontology_term_id` to include: + * *Caenorhabditis elegans* + * *Danio rerio* + * *Drosophila melanogaster* * Added `organism_cell_type_ontology_term_id` - * Updated `organism_ontology_term_id` for *Danio rerio* and *Drosophila melanogaster* to require all observations to contain the same value + * Updated `organism_ontology_term_id` to require all observations to contain the same value for: + * *Caenorhabditis elegans* + * *Danio rerio* + * *Drosophila melanogaster* * Added `organism_tissue_ontology_term_id` - * Updated `tissue_type` to require `"tissue"` for *Danio rerio* and *Drosophila melanogaster* + * Updated `sex_ontology_term_id` for *Caenorhabditis elegans* + * Updated `tissue_type` to require `"tissue"` for: + * *Caenorhabditis elegans* + * *Danio rerio* + * *Drosophila melanogaster* * var and raw.var (Gene Metadata) - * Updated `feature_reference` for *Danio rerio* and *Drosophila melanogaster* \ No newline at end of file + * Updated `feature_reference` to include: + * *Caenorhabditis elegans* + * *Danio rerio* + * *Drosophila melanogaster* \ No newline at end of file