Skip to content

Commit

Permalink
updated X matrix requirement (#1157)
Browse files Browse the repository at this point in the history
  • Loading branch information
brianraymor authored Dec 19, 2024
1 parent 2fc4898 commit b4b1a5f
Showing 1 changed file with 3 additions and 2 deletions.
5 changes: 3 additions & 2 deletions schema/drafts/5.3.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,9 +163,8 @@ The types below are python3 types. Note that a python3 `str` is a sequence of Un

## `X` (Matrix Layers)

The data stored in the `X` data matrix is the data that is viewable in CELLxGENE Explorer. CELLxGENE does not impose any additional constraints on the `X` data matrix.
The data stored in the `AnnData.X` data matrix is the data that is viewable in CELLxGENE Explorer. For `AnnData.X`, `AnnData.raw.X`, and all layers, if a data matrix contains 50% or more values that are zeros, it MUST be encoded as a [`scipy.sparse.csr_matrix`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html) with zero values encoded as <a href="https://docs.scipy.org/doc/scipy/tutorial/sparse.html#sparse-arrays-implicit-zeros-and-duplicates">implicit zeros</a>.

In any layer, if a matrix has 50% or more values that are zeros, it is STRONGLY RECOMMENDED that the matrix be encoded as a [`scipy.sparse.csr_matrix`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html) with zero values encoded as <a href="https://docs.scipy.org/doc/scipy/tutorial/sparse.html#sparse-arrays-implicit-zeros-and-duplicates">implicit zeros</a>.

CELLxGENE's matrix layer requirements are tailored to optimize data reuse. Because each assay has different characteristics, the requirements differ by assay type. In general, CELLxGENE requires submission of "raw" data suitable for computational reuse when a standard raw matrix format exists for an assay. It is STRONGLY RECOMMENDED to also include a "normalized" matrix with processed values ready for data analysis and suitable for visualization in CELLxGENE Explorer. So that CELLxGENE's data can be provided in download formats suitable for both R and Python, the schema imposes the following requirements:

Expand Down Expand Up @@ -2097,6 +2096,8 @@ When a dataset is uploaded, CELLxGENE Discover MUST automatically add the `schem
* Updated the requirements for <code>spatial[<i>library_id</i>]['scalefactors']</code> to include descendants of _Visium Spatial Gene Expression_.
* Updated the requirements for <code>spatial[<i>library_id</i>]['scalefactors']['spot_diameter_fullres']</code> to include descendants of _Visium Spatial Gene Expression_.
* Updated the requirements for <code>spatial[<i>library_id</i>]['scalefactors']['tissue_hires_scalef']</code> to include descendants of _Visium Spatial Gene Expression_.
* X (Matrix Layers)
* Updated the STRONGLY RECOMMENDED requirement to a MUST. A matrix with 50% or more values that are zeros MUST be encoded as `scipy.sparse.csr_matrix`.

### schema v5.2.0

Expand Down

0 comments on commit b4b1a5f

Please sign in to comment.