-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/nd2reader-xarray-ome-metadata #533
Feature/nd2reader-xarray-ome-metadata #533
Conversation
@yichechang Thank you for the contribution! It would definitely be nice if you could add a unit test to confirm that the exception handling works. (one "good" ome metadata and one "bad"? It may be the case that you need to catch more exception types...) The place to put it is in https://github.com/AllenCellModeling/aicsimageio/blob/main/aicsimageio/tests/readers/extra_readers/test_nd2_reader.py |
Actually it looks like the nd2 unit test is actually failing on this already. The problem I'm seeing is that |
Hi @toloudis - thanks for the reply and guidance! I saw the failed tests for nd2_reader and at first glance they seem to be related to metadata types, suggesting it has something to do with changes I made. I think I know what happens but will have to think about it more over the weekend as it is related to what behavior one wants for this reader. I have no idea why sldy_reader tests failed though. If I could also get @tlambert03 your opinion on this, it would be very helpful -
This is probably why these tests fail now as If I missed anything or if this feature is against nd2_reader plugin's design, please let me know or tell me if I should close this PR. For my use case (having access to OME metadata in the returned xarray DataArray), I guess I could always just force to use bioformats reader. |
A little bit of digging lead to the following observation:
But I cannot find the specifications of unprocessed vs processed. And presumably the most related documentation is around So at the reader level, Would be great if someone would be able to provide some insights on the design of the API? |
Maybe I can provide some insight!
This is entirely correct!!
This is also entirely correct! I don't think we ever full defined what is "good" or "bad" / "unprocessed" and "processed" metadata. The reason being is that some libraries may work best with a dictionary (i.e. metadata from JSON), some may work best with an XML etree (i.e. metadata from XML). In this case, we generally leave the "unprocessed" as the raw metadata, almost in string form if need be and the "processed" as "whatever object is best for the file format". This is also why we introduced the We defer in the order of "processed" -> "unprocessed" and "ome_metadata" its it own different thing.
Metadata is hard and annoying so this part of the API has always sort of been up in the air imo. |
Really all that is needed to make this PR "acceptable" imo is that we update the test to reflect that a new metadata type is returned. The reader metadata code change is totally fine. Because of our chain of deferring "which metadata is available" the actual test should be updated from |
I also just want to say. THANK YOU for this PR and for your deep dive into our API spec. Something for us to get better at. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, thank you! Everything I've read here suggests you've done a careful analysis. You're correct about what nd2
is pumping out of it's processed/unprocessed metadata and I don't really have anything to add. I agree with @evamaxfield that your code appears to be correct, and tests fails appear to be on the side of assumptions made in the aicsimageio tests. I'm also not entirely sure, so the team here is best suited to help with that. But I'm very happy with this change.
Hey everyone! Big thanks for all your feedbacks and for making and maintaining this amazing tool—I'm a huge fan! And kudos for fostering such a friendly community.
So, if I've got this right,
I’ve laid out a few possibilities (current behavior, this PR, and other solutions) and their implications below. I was all in for the changes in the current PR, but now, I'm wondering if it’s really the best fit for aiscimageio overall. If you guys think it’s a good PR, I will fix the failing tests soon or within the month. But if it’s better left unmerged or if there’s a better solution, I’m cool with that too. Looking forward to your thoughts! 🙌
|
Codecov ReportAll modified lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #533 +/- ##
=======================================
Coverage 89.54% 89.55%
=======================================
Files 53 53
Lines 4658 4662 +4
=======================================
+ Hits 4171 4175 +4
Misses 487 487
☔ View full report in Codecov by Sentry. |
I've updated the tests to reflect the new behavior introduced in this PR: The tests does not cover situation when nd2 < v0.7.0. I am not sure how to test package versions, and if that needs to be tested, I think it might be best to include in tests for |
I don't think that needs to be tested here. This is a one-line hasattr test and your catching of NotImplementedError is pretty fail-proof. From the perspective of aicsimageio, readers either return an instance of for all the rest of the stuff (i.e. what each reader is supposed to return from |
Hi @tlambert03, thank you so much for the detailed explanation and patience! In hindsight, this is such a small PR and I am very grateful for receiving lots of useful feedback and help from you all. |
small in terms of lines of code perhaps, but the analysis you did in #533 (comment) is kind of epic and the sort of contribution that all maintainers dream of getting :) |
The key comment is here: aicsimageio/aicsimageio/readers/reader.py Line 699 in 04a92f7
Basically readers can do almost anything they want with PROCESSED and UNPROCESSED. The ome metadata property is the only one that has a required output as an OME object from ome-types. The spirit of PROCESSED vs UNPROCESSED is this (@evamaxfield please correct me if I am remembering this wrong!): UNPROCESSED: the native file format's metadata as read directly from the file. Can often be just a string or a dict. PROCESSED: the native file format's metadata as a higher level object. Up the the discretion of the implementor. Can be a dict, an xml tree, an ome-types object etc. The contract is specific to each reader if metadata is being filtered or modified. |
I think this is correct / aligns with my original intent. @yichechang @toloudis Directly to this PR, I think it is fine for the "PROCESSED" metadata to be the OME object. It is the "more useful / most useful version of the metadata". In which case the only change needed is that the test should be updated to expect an OME object.
Seconding this comment. When I saw your comment I was over the moon. It is not only helpful for the PR but also for knowing where our existing API is under-defined / hard to follow / poorly documented. |
Thank you all again for your kind words and insights : ) In the most recent checks, all nd2 tests have passed I believe. Those that failed are
Any idea if some of these might be directly related to this PR? |
I thought that the most recent PR #529 merged to main might have something to do with the tests failed here (as mentioned above, for Now I have a merge commit (commit ed62d8a) and I hope it's not horrible. Let me know if anything needs me fixing! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, thank you for the pull request!
Merging this PR, I believe |
Description
Include parsed metadata (OME) as attrs of returned xarray DataArray when using nd2_reader.
The OME metadata is already accessible from the nd2_reader since 4.12.1 (specifically from PR #521), but is currently not included in the xarray interface.
It's my first PR and while I tried to dig in to tests for
Reader
,AICSImage
, and extra readers, I didn't see if this specific feature is covered/should be included. I'm probably also missing a lot of things and will be grateful if anyone would point me some reference and/or just say "read this!".Also - if an issue/discussion is required before opening a PR, please let me know and I'll change this to a draft.
Pull request recommendations:
Didn't see where this should be tested or if at all tested for other readers. But it's most likely that I missed something... I'll be happy to include tests if I could get some suggestions/guidance.