Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transect subset selection bug #95

Open
emiliom opened this issue Sep 26, 2023 · 4 comments
Open

Transect subset selection bug #95

emiliom opened this issue Sep 26, 2023 · 4 comments
Labels
bug_in_python Something in the Python implementation does not match what's in Matlab
Milestone

Comments

@emiliom
Copy link
Contributor

emiliom commented Sep 26, 2023

PR #89 overhauled the interim transect subset selection approach initially implemented in v0.1.0-alpha. The intent was to implement the approach found in Matlab EchoPro, as described in #33.

I have found a bug that occurs in some circumstances. The bug emerged when rerunning the bootstrapping_walkthrough.ipynb notebook with the current main. PR #89 introduced a very small change to that notebook, changing the removal_percentage argument value in boot_obj.run_bootstrapping from 50.0 to 60.0. The notebook runs successfully when using 50.0, but boot_obj.run_bootstrapping results in a transect selection error when using 60.0

Currently the notebook transect_selection_workflow.ipynb runs successfully while also applying transect subsetting. But the percentage used is also 50.0%. In addition, there is a transect subsetting test, test_transect_selection.py::test_transect_selection_output, that also runs successfully; rather than a %, it uses a preselected list of transect ids.

The error happens in computation/transect_results.py::set_adult_NASC, specifically in this statement:
https://github.com/uw-echospace/EchoPro/blob/9dc4708b409f9b0a71897dd06a690d26eb1a2e8d/EchoPro/computation/transect_results.py#L1272-L1274
It's a KeyError (eg, "KeyError: '[6] not in index'") generated when nasc_fraction_adult_df is missing a stratum number found in self.nasc_df.stratum_num. As described in #33, when a subset of transects is requested, the reduced transect set may not contain all possible stratum_num. A scheme is needed to backfill values for the "missing" strata. The scheme introduced in v0.1.0-alpha and described in #33 under "Current solution for missing strata" followed some simple rules Brandon and I devised to fill or interpolate missing strata. PR #89 replaced that strategy with the more involved (and possibly opaque?) scheme used in the Matlab EchoPro code. However, for transect_results.py::set_adult_NASC, it looks like the missing-strata scheme was overlooked. The variable nasc_fraction_adult_df is created based on the strata found in self.bin_ds.len_age_dist_all, then its values are populated by looping over those strata in a for loop. Therefore, when strata are missing in self.bin_ds.len_age_dist_all, they are never backfilled in nasc_fraction_adult_df, which leads to the error. Note: self.bin_ds is an Xarray Dataset that contains stratum_num as a dimension and coordinate variable; self.bin_ds.len_age_dist_all contains that dimension.

This error has nothing to do with bootstrapping per se. But because bootstrapping generates several realizations of a transect subset, it more easily led to the error condition.

@emiliom emiliom added the bug_in_python Something in the Python implementation does not match what's in Matlab label Sep 26, 2023
@emiliom emiliom added this to the Release 0.3.1 milestone Sep 26, 2023
@emiliom
Copy link
Contributor Author

emiliom commented Sep 28, 2023

I implemented a fix (PR #99) to handle missing strata that is based on the scheme previously used (ie, before Brandon's changes leading up to v0.3.1) and reuses a variable that was already available, self.stratum_choices. For missing strata, this variable is prepopulated with the neighboring strata (1 or 2, I think) that can be used to populate the missing strata. Here's what I implemented:
https://github.com/uw-echospace/EchoPro/blob/9dc4708b409f9b0a71897dd06a690d26eb1a2e8d/EchoPro/computation/transect_results.py#L1261-L1267

I also added a test that runs through multiple realizations of transect subset selection:
https://github.com/uw-echospace/EchoPro/blob/9dc4708b409f9b0a71897dd06a690d26eb1a2e8d/EchoPro/tests/computation/test_transect_selection.py#L13

Missing strata are identified and addressed in other parts of transect_results.py. For example, see this block that calls a few methods to that effect:
https://github.com/uw-echospace/EchoPro/blob/9dc4708b409f9b0a71897dd06a690d26eb1a2e8d/EchoPro/computation/transect_results.py#L1378-L1381

It's unclear yet whether any of those can be applied in set_adult_NASC.

@emiliom
Copy link
Contributor Author

emiliom commented Sep 29, 2023

The bug itself was addressed in release 0.3.1. Transect subset selection works again. But there is more work to be done to identify and implement the desired scheme for backfilling missing strata. This will be done in release 0.3.2.

emiliom added a commit to emiliom/EchoPro that referenced this issue Dec 3, 2023
emiliom added a commit that referenced this issue Dec 3, 2023
…rocessing (#126)

* Set sex input column to np.int8 data type to trim memory use. Drop unused catch haul_count

* Replace float with np.float64 for column data types in kriging_mesh.py, for consistency

* Rename geo-strata input "Latitude (upper limit)" column to northlimit_latitude

* transect_num in haul-vs-transect file is now handled as an int and can not contain nulls

* Use dataframe copy to avoid warnings due to assignment operations on dataframe view

* Populate 'Empty value' column in input-files docs page, plus other updates to that page

* Skip partially failing transect-selection test until #95 is resolved
@emiliom
Copy link
Contributor Author

emiliom commented Dec 8, 2023

This was partially addressed in release 0.3.1, but more work is needed. I'll retag it as release 0.3.2.

@emiliom emiliom modified the milestones: Release 0.3.1, Release 0.3.2 Dec 8, 2023
@emiliom emiliom self-assigned this Dec 8, 2023
@emiliom
Copy link
Contributor Author

emiliom commented Dec 8, 2023

Investigate relationship to issues brought up in #139

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug_in_python Something in the Python implementation does not match what's in Matlab
Projects
Status: No status
Development

No branches or pull requests

2 participants