ml_analyzed should be computed from hdr and adc info to select the best estimate #70

hsosik · 2022-09-06T21:30:32Z

Some cases have obviously bad info in the hdr file for ml_analyzed estimates, but other cases are less obvious but still bad. A brute force approach is to compute ml_analyzed both ways, compare the results, and select the adc based value in cases where the difference is outside some tolerance. This presumes the adc value is more likely to be correct (which seems to be true from my inspection of results).

hsosik · 2022-09-06T21:32:53Z

A reasonable starting place for criteria and tolerance:

if abs(ml_analyzed_hdr-ml_analyzed_adc)<0.05,
ml_analyzed = ml_analyzed_hdr
else
ml_analyzed = ml_analyzed_adc
end

hsosik · 2022-09-06T21:35:39Z

Among 1074 bins in the SG2105 dataset, here are the 19 bins that have bad ml_analyzed_hdr according to the above criteria:

                pid                 ml_analyzed_hdr    ml_analyzed_adc
____________________________    _______________    _______________

{'D20210503T171928_IFCB115'}         0.88862           0.83318    
{'D20210507T225916_IFCB102'}          32.456            3.3309    
{'D20210511T220157_IFCB102'}          13.123            2.2706    
{'D20210512T155857_IFCB102'}        0.024627            2.9072    
{'D20210513T003630_IFCB102'}        0.024627            2.0426    
{'D20210513T113350_IFCB102'}        0.024627            2.6736    
{'D20210513T234948_IFCB102'}          6.5381            2.6978    
{'D20210514T090858_IFCB102'}        0.024627            2.5023    
{'D20210514T172134_IFCB102'}          2.7174            2.1922    
{'D20210515T163434_IFCB102'}           10.81            3.0479    
{'D20210515T232812_IFCB102'}           4.987            2.9775    
{'D20210518T014048_IFCB102'}          10.809            2.6808    
{'D20210518T081012_IFCB102'}          4.9725            2.6879    
{'D20210518T205045_IFCB102'}         0.49873            2.6363    
{'D20210519T152904_IFCB102'}          4.9872            2.1729    
{'D20210519T161741_IFCB102'}      0.00049828            2.3269    
{'D20210520T025630_IFCB102'}          32.687            2.7402    
{'D20210520T135944_IFCB102'}          4.9873            2.8065    
{'D20210520T182732_IFCB102'}           6.397            1.4393

joefutrelle · 2022-09-07T14:49:03Z

@hsosik why not simply always use the value computed from the ADC data?

hsosik · 2022-09-07T14:55:37Z

The hdr value is more accurate-when it's not wrong. It is better most of the time.

joefutrelle · 2023-06-01T16:35:07Z

Reopening this to report on recent work and close out the changes

hsosik · 2023-10-18T17:16:40Z

I don't see recent activity, so maybe this is not the correct place to post this--let me know if there's another more active issue. I see something unexpected in the results now in the IFCB dashboard database for at least one case in 2023.

For this bin:
https://ifcb-data.whoi.edu/bin?bin=D20230727T025611_IFCB127
volume analyzed is now reported as:
Volume Analyzed: 4.988 ml

My matlab result is as follows:
>> IFCB_volume_analyzed('https://ifcb-data.whoi.edu/mvco/D20230727T025611_IFCB127.hdr')
ans =
2.8529

It appears that the current python code must be using two bad lines at the end of the adc file that have inhibittime reported at 0, which is incorrect.

This difference in result from the matlab and python code should show up in more bins if we do a more systematic comparison between the two, which I think should be done to make sure there are not other inconsistencies.

hsosik · 2023-10-18T20:15:05Z

Here is another example from the EXPORTS data set that is also not working in the python implemenation:
https://ifcb-data.whoi.edu/bin?bin=D20210501T163341_IFCB125

IFCB dashboard shows:
Volume Analyzed: 4.978 ml

Matlab result is:
>> IFCB_volume_analyzed('https://ifcb-data.whoi.edu/EXPORTS/D20210501T163341_IFCB125.hdr')
ans =
1.748566335416668

This is a case with many 0 values in the last two time columns in the adc file and it is supposed to be handled by a case in the code that uses only the non-zero time rows along with a mode value of the good inhibit times for the 0 rows:
%second best estimate, last good row, plus mode as best guess for each bad row
inhibittime(count) = adc.Var24(iii(end)) + (size(adc,1)-length(iii)) * modeinhibittime-inhibittime_offset;

Is the python code missing these cases or did the wrong code get implemented for the new updates to the dashboard database?

joefutrelle · 2023-12-14T14:19:28Z

Your diagnosis is correct and adding the case brings the Python and MATLAB code into agreement for these bins. PR is #76

joefutrelle added a commit that referenced this issue Sep 7, 2022

implementing #70 and adjusting test data

673e897

joefutrelle mentioned this issue Sep 7, 2022

Compute ml_analyzed from adc when hdr data produces incorrect results #72

Merged

joefutrelle closed this as completed Sep 7, 2022

hsosik mentioned this issue Sep 8, 2022

ml_analyzed error for bad last line of adc files #57

Open

joefutrelle reopened this Jun 1, 2023

joefutrelle mentioned this issue Dec 14, 2023

adding missing case for inhibittime > 0 #76

Merged

joefutrelle closed this as completed in #76 Jan 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ml_analyzed should be computed from hdr and adc info to select the best estimate #70

ml_analyzed should be computed from hdr and adc info to select the best estimate #70

hsosik commented Sep 6, 2022

hsosik commented Sep 6, 2022

hsosik commented Sep 6, 2022 •

edited by joefutrelle

Loading

joefutrelle commented Sep 7, 2022

hsosik commented Sep 7, 2022 via email •

edited by joefutrelle

Loading

joefutrelle commented Jun 1, 2023

hsosik commented Oct 18, 2023

hsosik commented Oct 18, 2023

joefutrelle commented Dec 14, 2023 •

edited

Loading

ml_analyzed should be computed from hdr and adc info to select the best estimate #70

ml_analyzed should be computed from hdr and adc info to select the best estimate #70

Comments

hsosik commented Sep 6, 2022

hsosik commented Sep 6, 2022

hsosik commented Sep 6, 2022 • edited by joefutrelle Loading

joefutrelle commented Sep 7, 2022

hsosik commented Sep 7, 2022 via email • edited by joefutrelle Loading

joefutrelle commented Jun 1, 2023

hsosik commented Oct 18, 2023

hsosik commented Oct 18, 2023

joefutrelle commented Dec 14, 2023 • edited Loading

hsosik commented Sep 6, 2022 •

edited by joefutrelle

Loading

hsosik commented Sep 7, 2022 via email •

edited by joefutrelle

Loading

joefutrelle commented Dec 14, 2023 •

edited

Loading