-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ml_analyzed should be computed from hdr and adc info to select the best estimate #70
Comments
A reasonable starting place for criteria and tolerance: if abs(ml_analyzed_hdr-ml_analyzed_adc)<0.05, |
Among 1074 bins in the SG2105 dataset, here are the 19 bins that have bad ml_analyzed_hdr according to the above criteria:
|
@hsosik why not simply always use the value computed from the ADC data? |
The hdr value is more accurate-when it's not wrong. It is better most of the time.
|
Reopening this to report on recent work and close out the changes |
I don't see recent activity, so maybe this is not the correct place to post this--let me know if there's another more active issue. I see something unexpected in the results now in the IFCB dashboard database for at least one case in 2023. For this bin: My matlab result is as follows: It appears that the current python code must be using two bad lines at the end of the adc file that have inhibittime reported at 0, which is incorrect. This difference in result from the matlab and python code should show up in more bins if we do a more systematic comparison between the two, which I think should be done to make sure there are not other inconsistencies. |
Here is another example from the EXPORTS data set that is also not working in the python implemenation: IFCB dashboard shows: Matlab result is: This is a case with many 0 values in the last two time columns in the adc file and it is supposed to be handled by a case in the code that uses only the non-zero time rows along with a mode value of the good inhibit times for the 0 rows: Is the python code missing these cases or did the wrong code get implemented for the new updates to the dashboard database? |
Your diagnosis is correct and adding the case brings the Python and MATLAB code into agreement for these bins. PR is #76 |
Some cases have obviously bad info in the hdr file for ml_analyzed estimates, but other cases are less obvious but still bad. A brute force approach is to compute ml_analyzed both ways, compare the results, and select the adc based value in cases where the difference is outside some tolerance. This presumes the adc value is more likely to be correct (which seems to be true from my inspection of results).
The text was updated successfully, but these errors were encountered: