-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add feature length field, annotated by add-labels function #624
Conversation
Codecov Report
@@ Coverage Diff @@
## main #624 +/- ##
==========================================
+ Coverage 83.02% 83.17% +0.14%
==========================================
Files 19 19
Lines 1703 1718 +15
==========================================
+ Hits 1414 1429 +15
Misses 289 289
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
I can't believe I'm asking this but what is a "spike-in protein"? From Google:
|
oops--I did mean spike-in control, I have no idea why I wrote protein 😅 sorry! |
@nayib-jose-gloria tangentially-related: would you mind cleaning up this method description, i.e. turn it into parseable English 🙏 single-cell-curation/cellxgene_schema_cli/cellxgene_schema/write_labels.py Lines 324 to 325 in f6bb232
|
return self.gene_dict[gene_id] | ||
return self.gene_dict[gene_id][0] | ||
|
||
def get_length(self, gene_id) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type gene_id
mapping_dict = {} | ||
|
||
for i in ids: | ||
if i.startswith("ENS"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we start putting hardcoded string into a constant.py file? Not sure we have enough cases for it to make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be a good thing to do at the end of the epic? Or we can start now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's necessary at this point, we can revisit if we find that it's coming up frequently and causing confusion.
for df in ["var", "raw.var"]: | ||
for i in self.validator.schema_def["components"][df]["index"]["add_labels"]: | ||
column = i["to_column"] | ||
with self.subTest(column=column, df=df): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(repeat from another PR) let's sync on this today--this whole module of schema compliance tests uses unittest libraries / syntax, so switching over to pytest I think is out of scope for this ticket + and the other open issues. But we can discuss if it's worth ticketing separately to swap everything over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://app.zenhub.com/workspaces/single-cell-5e2a191dad828d52cc78b028/issues/gh/chanzuckerberg/single-cell-curation/633 tracking a separate issue for this
column = i["to_column"] | ||
with self.subTest(column=column, df=df): | ||
# Resetting validator | ||
self.validator.adata = examples.adata.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe these 2 lines can be removed if you switch to a pytest approach.
:return A gene length | ||
""" | ||
|
||
if not self.is_valid_id(gene_id): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Bento007 has encouraged me to avoid negatives a la if not...
where possible
Could turn the last few lines of this method into a one-liner:
return self.gene_dict[gene_id][1] if self.is_valid_id(gene_id) else raise ValueError(f"The id '{gene_id}' is not a valid ENSEMBL id for '{self.species}'")
🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update to avoid the negative, but I personally find one-liners harder to read
self.assertEqual(self.writer._get_mapping_dict_feature_length(ids), expected_dict) | ||
|
||
# Bad | ||
ids = ["NO_GENE"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all of the test_get_dictionary_*
tests in TestAddLabelFunctions
, you wish to consider factoring out common dict comparison logic and ids
initial variables. Could create a helper function or could simply group all under a single test. Not required though.
#516
Changes: