Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ml_analyzed should be computed from hdr and adc info to select the best estimate #70

Closed
hsosik opened this issue Sep 6, 2022 · 8 comments · Fixed by #76
Closed

ml_analyzed should be computed from hdr and adc info to select the best estimate #70

hsosik opened this issue Sep 6, 2022 · 8 comments · Fixed by #76

Comments

@hsosik
Copy link

hsosik commented Sep 6, 2022

Some cases have obviously bad info in the hdr file for ml_analyzed estimates, but other cases are less obvious but still bad. A brute force approach is to compute ml_analyzed both ways, compare the results, and select the adc based value in cases where the difference is outside some tolerance. This presumes the adc value is more likely to be correct (which seems to be true from my inspection of results).

@hsosik
Copy link
Author

hsosik commented Sep 6, 2022

A reasonable starting place for criteria and tolerance:

if abs(ml_analyzed_hdr-ml_analyzed_adc)<0.05,
ml_analyzed = ml_analyzed_hdr
else
ml_analyzed = ml_analyzed_adc
end

@hsosik
Copy link
Author

hsosik commented Sep 6, 2022

Among 1074 bins in the SG2105 dataset, here are the 19 bins that have bad ml_analyzed_hdr according to the above criteria:

                pid                 ml_analyzed_hdr    ml_analyzed_adc
____________________________    _______________    _______________

{'D20210503T171928_IFCB115'}         0.88862           0.83318    
{'D20210507T225916_IFCB102'}          32.456            3.3309    
{'D20210511T220157_IFCB102'}          13.123            2.2706    
{'D20210512T155857_IFCB102'}        0.024627            2.9072    
{'D20210513T003630_IFCB102'}        0.024627            2.0426    
{'D20210513T113350_IFCB102'}        0.024627            2.6736    
{'D20210513T234948_IFCB102'}          6.5381            2.6978    
{'D20210514T090858_IFCB102'}        0.024627            2.5023    
{'D20210514T172134_IFCB102'}          2.7174            2.1922    
{'D20210515T163434_IFCB102'}           10.81            3.0479    
{'D20210515T232812_IFCB102'}           4.987            2.9775    
{'D20210518T014048_IFCB102'}          10.809            2.6808    
{'D20210518T081012_IFCB102'}          4.9725            2.6879    
{'D20210518T205045_IFCB102'}         0.49873            2.6363    
{'D20210519T152904_IFCB102'}          4.9872            2.1729    
{'D20210519T161741_IFCB102'}      0.00049828            2.3269    
{'D20210520T025630_IFCB102'}          32.687            2.7402    
{'D20210520T135944_IFCB102'}          4.9873            2.8065    
{'D20210520T182732_IFCB102'}           6.397            1.4393   

@joefutrelle
Copy link
Owner

@hsosik why not simply always use the value computed from the ADC data?

@hsosik
Copy link
Author

hsosik commented Sep 7, 2022 via email

@joefutrelle
Copy link
Owner

Reopening this to report on recent work and close out the changes

@hsosik
Copy link
Author

hsosik commented Oct 18, 2023

I don't see recent activity, so maybe this is not the correct place to post this--let me know if there's another more active issue. I see something unexpected in the results now in the IFCB dashboard database for at least one case in 2023.

For this bin:
https://ifcb-data.whoi.edu/bin?bin=D20230727T025611_IFCB127
volume analyzed is now reported as:
Volume Analyzed: 4.988 ml

My matlab result is as follows:
>> IFCB_volume_analyzed('https://ifcb-data.whoi.edu/mvco/D20230727T025611_IFCB127.hdr')
ans =
2.8529

It appears that the current python code must be using two bad lines at the end of the adc file that have inhibittime reported at 0, which is incorrect.

This difference in result from the matlab and python code should show up in more bins if we do a more systematic comparison between the two, which I think should be done to make sure there are not other inconsistencies.

@hsosik
Copy link
Author

hsosik commented Oct 18, 2023

Here is another example from the EXPORTS data set that is also not working in the python implemenation:
https://ifcb-data.whoi.edu/bin?bin=D20210501T163341_IFCB125

IFCB dashboard shows:
Volume Analyzed: 4.978 ml

Matlab result is:
>> IFCB_volume_analyzed('https://ifcb-data.whoi.edu/EXPORTS/D20210501T163341_IFCB125.hdr')
ans =
1.748566335416668

This is a case with many 0 values in the last two time columns in the adc file and it is supposed to be handled by a case in the code that uses only the non-zero time rows along with a mode value of the good inhibit times for the 0 rows:
%second best estimate, last good row, plus mode as best guess for each bad row
inhibittime(count) = adc.Var24(iii(end)) + (size(adc,1)-length(iii)) * modeinhibittime-inhibittime_offset;

Is the python code missing these cases or did the wrong code get implemented for the new updates to the dashboard database?

@joefutrelle
Copy link
Owner

joefutrelle commented Dec 14, 2023

Your diagnosis is correct and adding the case brings the Python and MATLAB code into agreement for these bins. PR is #76

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants