-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transect subset selection bug #95
Comments
I implemented a fix (PR #99) to handle missing strata that is based on the scheme previously used (ie, before Brandon's changes leading up to v0.3.1) and reuses a variable that was already available, I also added a test that runs through multiple realizations of transect subset selection: Missing strata are identified and addressed in other parts of It's unclear yet whether any of those can be applied in |
The bug itself was addressed in release 0.3.1. Transect subset selection works again. But there is more work to be done to identify and implement the desired scheme for backfilling missing strata. This will be done in release 0.3.2. |
…rocessing (#126) * Set sex input column to np.int8 data type to trim memory use. Drop unused catch haul_count * Replace float with np.float64 for column data types in kriging_mesh.py, for consistency * Rename geo-strata input "Latitude (upper limit)" column to northlimit_latitude * transect_num in haul-vs-transect file is now handled as an int and can not contain nulls * Use dataframe copy to avoid warnings due to assignment operations on dataframe view * Populate 'Empty value' column in input-files docs page, plus other updates to that page * Skip partially failing transect-selection test until #95 is resolved
This was partially addressed in release 0.3.1, but more work is needed. I'll retag it as release 0.3.2. |
Investigate relationship to issues brought up in #139 |
PR #89 overhauled the interim transect subset selection approach initially implemented in v0.1.0-alpha. The intent was to implement the approach found in Matlab EchoPro, as described in #33.
I have found a bug that occurs in some circumstances. The bug emerged when rerunning the bootstrapping_walkthrough.ipynb notebook with the current
main
. PR #89 introduced a very small change to that notebook, changing theremoval_percentage
argument value inboot_obj.run_bootstrapping
from 50.0 to 60.0. The notebook runs successfully when using 50.0, butboot_obj.run_bootstrapping
results in a transect selection error when using 60.0Currently the notebook transect_selection_workflow.ipynb runs successfully while also applying transect subsetting. But the percentage used is also 50.0%. In addition, there is a transect subsetting test, test_transect_selection.py::test_transect_selection_output, that also runs successfully; rather than a %, it uses a preselected list of transect ids.
The error happens in
computation/transect_results.py::set_adult_NASC
, specifically in this statement:https://github.com/uw-echospace/EchoPro/blob/9dc4708b409f9b0a71897dd06a690d26eb1a2e8d/EchoPro/computation/transect_results.py#L1272-L1274
It's a
KeyError
(eg, "KeyError: '[6] not in index'") generated whennasc_fraction_adult_df
is missing a stratum number found inself.nasc_df.stratum_num
. As described in #33, when a subset of transects is requested, the reduced transect set may not contain all possiblestratum_num
. A scheme is needed to backfill values for the "missing" strata. The scheme introduced in v0.1.0-alpha and described in #33 under "Current solution for missing strata" followed some simple rules Brandon and I devised to fill or interpolate missing strata. PR #89 replaced that strategy with the more involved (and possibly opaque?) scheme used in the Matlab EchoPro code. However, fortransect_results.py::set_adult_NASC
, it looks like the missing-strata scheme was overlooked. The variablenasc_fraction_adult_df
is created based on the strata found inself.bin_ds.len_age_dist_all
, then its values are populated by looping over those strata in a for loop. Therefore, when strata are missing inself.bin_ds.len_age_dist_all
, they are never backfilled innasc_fraction_adult_df
, which leads to the error. Note:self.bin_ds
is an Xarray Dataset that containsstratum_num
as a dimension and coordinate variable;self.bin_ds.len_age_dist_all
contains that dimension.This error has nothing to do with bootstrapping per se. But because bootstrapping generates several realizations of a transect subset, it more easily led to the error condition.
The text was updated successfully, but these errors were encountered: