Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(4.0): Add check for conditions that would cause anndata.write to fail. #643

Merged
merged 17 commits into from
Sep 27, 2023

Conversation

Bento007
Copy link
Contributor

@Bento007 Bento007 commented Sep 22, 2023

Reason for Change

Changes

  • Add a check for categorical columns with none string values
  • Add a check for column with mixed types.

Testing

  • Added unit test.

@codecov
Copy link

codecov bot commented Sep 25, 2023

Codecov Report

Merging #643 (7a9b7a3) into main (7c67e97) will increase coverage by 0.04%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #643      +/-   ##
==========================================
+ Coverage   83.77%   83.81%   +0.04%     
==========================================
  Files          19       19              
  Lines        1744     1749       +5     
==========================================
+ Hits         1461     1466       +5     
  Misses        283      283              
Flag Coverage Δ
unittests 83.81% <100.00%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
cellxgene_schema_cli/cellxgene_schema/validate.py 94.71% <100.00%> (+0.04%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@nayib-jose-gloria nayib-jose-gloria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a suggestion on cleaning up the flag help message, otherwise this looks good! Might be good to get a thumbs up from Jason on this approach + the help text + flag name even before merging

- Having mixed types in a column
- Having none string values for Catagories
@Bento007 Bento007 changed the title feat(4.0): Add "write-check" flag to check if labels can be added feat(4.0): Add check for conditions that would cause anndata.write to fail. Sep 25, 2023
catagory_types = {type(x) for x in column.dtype.categories.values}
if len(catagory_types) > 1 or str not in catagory_types:
self.errors.append(
f"Column '{column_name}' in dataframe '{df_name}' must only contain string catagories. Found {catagory_types}."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit catagories -> categories

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same w/ catagory_types var name

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, I think it is true that we only support str categories in our current schema definition. But are we 100% certain that will always be true? If this is an anndata constraint, mind linking me to a doc or comment specifying this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not documented in anndata because it is a bug. Anndata should support not string values as categories and this is fixed in anndata 0.10.0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Mind adding a comment to this effect--maybe a TODO so we remember to remove this check for anndata 0.10.0 and/or if we get a non-string categorical field in schema 5 we remember to highlight this anndata 0.8.0 bug to product?

def schema_validate(h5ad_file, add_labels_file, ignore_labels, verbose):
@click.option(
"-w",
"--write-check",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jahilton I'd like to get your thoughts on this new flag before I merge.

@Bento007 Bento007 merged commit b765098 into main Sep 27, 2023
8 checks passed
@Bento007 Bento007 deleted the tsmith/405-no-fail-add-labels branch September 27, 2023 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants