-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prototype subpackages along processing levels #817
Conversation
for more information, see https://pre-commit.ci
Codecov Report
@@ Coverage Diff @@
## dev #817 +/- ##
===========================================
- Coverage 82.20% 56.25% -25.96%
===========================================
Files 52 2 -50
Lines 5018 48 -4970
===========================================
- Hits 4125 27 -4098
+ Misses 893 21 -872
Flags with carried forward coverage won't be shown. Click here to find out more. 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
@emiliom : a question for you: |
@leewujung can you add links to references or web pages that describe the calculations? Some of those may be found in https://github.com/open-ocean-sounding/echopy TODO:
Below I flesh out individual modules or functions as found in this PR, so we can see it in one place. (E) = Existing
|
Based on discussion with @leewujung and @emiliom we have agreed that the mask sub-package functions should only return a mask and not apply the mask. For this reason, we need to create a function that applies masks, say |
Thanks for the summary @b-reyes. My inclination is:
My inclination is to restrict both to be a Dataset for the following reasons:
I agree that under the |
I, too, agree with
I'm on board with the function taking in an Sv Dataset. But I'd like to go further in the specification:
It may be helpful to look over the |
Great, it sounds like we all agree on the location of the
This is a good point that I had not thought of. Also supplying the name seems like a cumbersome task for the user. Perhaps this function should only take in a DataArray? That way we do not have to worry about this.
Another good point and something that is not accounted for in #901. I do not like the idea of the user needing to supply this name, but it seems like it might be needed. @leewujung pinging you as we should decide on this before full review of #901.
I would assume that it would output a new Dataset, so the original source could remain unmodified. However, we could always make this an input flag, say
I think including the mask in the output Dataset is not preferable. The masks are already an input of the function, so the user could already save them, if they want to. Additionally, attaching the mask would increase the data size. However, If we really want to make this an option, we could have a keyword, say |
Thanks @b-reyes @emiliom for the discussions above. Input format:
Remove noise data variable name:
Output:
I'll ping both of you off github to resolve these discussions. |
Based on discussions between @leewujung and myself, we have arrived at the following for
|
In the above comment I mention that we should let |
I agree with omitting |
Just a comment that in the context of xarray (and netcdf/zarr) a "fill value" is typically (or always?) a scalar. It's intended to be literally a single value. I see that the types you've listed come directly from |
@emiliom : yes we did discuss limiting it to only a scalar, but since we are using xr.where under the hood, seems a convenient to just allow a DataArray. In our own use of masks, it will just be NaN by default. |
In addition to the inputs provided above for |
Current state of individual modules or functions, existing and planned, organized by subpackage. The order of subpackages roughly reflects processing order. TODO:
(E) = Existing
|
I'll close this PR now since it is very outdated. Some of the functions mentioned here in prototype form have been implemented, and others either won't be implemented in echopype (those related to region and line labels under |
NOT ready for review/discussion yet! I am still moving things around and will request for review when ready.
This PR prototypes subpackages that lie along data processing levels to bring data beyond the calibrated Sv. These include (from low to high processing levels, with some mixing in between):
convert
(existing)calibrate
(existing)filter
: remove noise or reducing variability in the dataunify
mask
metrics
: summary statistics and NASC