Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop a plan for GDAC QC #380

Open
kbailey-noaa opened this issue Oct 2, 2024 · 3 comments
Open

Develop a plan for GDAC QC #380

kbailey-noaa opened this issue Oct 2, 2024 · 3 comments

Comments

@kbailey-noaa
Copy link
Contributor

What can be improved? What is missing?

FY24 GDAC SOW:
Per IOOS Certification requirements, data served via the GDAC must be quality controlled, whether by routines applied by the provider or by the GDAC. The GDAC must apply QARTOD to variables that have existing QARTOD manuals (e.g. Manual for QC of Glider Temperature and Salinity Data). Flags must be published in the data files following IOOS Metadata standards (ioos.github.io).

The GDAC Team shall develop a plan for full implementation of QC in the GDAC using the following tiered approach:
Near-term: implement or repair existing Required and Strongly Recommended QARTOD tests, using global thresholds for all GDAC data, and publish aggregate QC flags (qc_agg).
Mid-term: Use QARTOD manuals and standards to understand and identify the potential for improvements to existing tests (e.g. regional refinement of thresholds, additional real-time tests, etc).
Long-term: Engage a scientific working group, possibly under UG2, to investigate the potential for delayed-mode quality control and/or to gain feedback on additional QC considerations.

  1. Document an implementation plan that addresses the above 3 aspects of QC, which includes what we're doing now, gaps that remain, and what we intend to do, for both real-time and delayed mode datasets. Consider this plan to be a proposal that we'd run by the community.

  2. Identify a forum to document this plan. I highly discourage Word documents, since those aren't interactive. Options are the GitHub wiki (I can activate this), some type of Slack feature (?), the GitHub Discussions option...something else?

@leilabbb
Copy link
Contributor

The Near-Term goal has been implemented in the GDAC:

About implementing or repairing existing required and/or strongly recommended QARTOD tests:

:: Five geophysical datasets are quality-controlled using the IOOS-QC QARTOD modules.
Temperature
Conductivity
Density
Pressure
Salinity

:: Five test functions are used to perform quality control on the geophysical variables.
Gross Range Test
Spike Test
Rate of Change Test
Flat Line Test
Aggregate Quality Flag

:: The QARTOD location test is customized to align with the parameters available in the gliders' profile files. The two variables used in the location test are:
Longitude
Latitude

About using global thresholds for all GDAC data.
:: The GDAC uses a configuration file to set up the global thresholds for the QARTOD tests.

:: The thresholds for the spike test and the rate of change test are updated during the QC process to align with the datasets' ranges.

About publishing aggregate QC flags (qc_agg).
:: The flags are calculated using the QARTOD compare function, which employs a prioritization technique to generate a final list of quality flags for a geophysical variable..
priority list:
{
QartodFlags.MISSING,
QartodFlags.UNKNOWN,
QartodFlags.GOOD,
QartodFlags.SUSPECT,
QartodFlags.FAIL,
}

@leilabbb
Copy link
Contributor

leilabbb commented Dec 18, 2024

The main Mid-term goal has been implemented in the GDAC:

MAIN
About using the QARTOD manuals.

:: The IOOS-QC module's functions were developed based on the QARTOD manuals and are used by GDAC to generate data quality flags. Links to the test functions used to generate the flags are listed here:

Gross Range Test
Spike Test
Rate of Change Test
Flat Line Test
Aggregate Quality Flag
Location Test. This QARTOD function is customized to work with the glider profiles files.

About using standards to understand existing tests.

:: A [configuration file] (https://github.com/ioos/glider-dac/blob/main/data/qc_config.yml) is used to set thresholds as function arguments for the GDAC QC tests. Consult the list of links below for more information.

  • The Gross Range and Flat Line Tests use global ranges to define the quality flags.
  • The Rate of Change and Spike Tests use the get_rate_of_change_threshold and get_spike_threshold functions, developed by the GDAC, to calculate the QARTOD thresholds.
  • The Location Test uses a conditional statement, developed by the GDAC, to calculate the QARTOD thresholds. It reports on data issues such as:
    • out of range profile_lat and profile_lon
    • missing profile_lat and profile_lon

About identifying additional real-time tests.

:: The GDAC has developed new functions or tests to:

  • Check the time array for:
    • missing deployment start time
    • deployment time not in %Y%m%dT%H%M%S format
    • start time precedes deployment time
    • invalid timestamps (masked, NaNs, zeros, Fill Values)
    • duplicate timestamps
    • out of order timestamps
  • Check the geophysical data arrays for:
    • missing variable
    • switched valid_min and valid_max values
    • invalid standard name
    • a shared same standard name
    • invalid units
    • fewer valid points
    • invalid data (masked array, unique values, all NaNs, all Fill Values)

:: The GDAC has implemented the dac_qc_comment

  • The QC process generates a report summarizing major data issues identified during QARTOD test implementation.

ADDITIONAL
About potential for improvements.

:: Add regional refinement of thresholds: This is an additional level of quality control that requires further information before implementation.

For example, a table with thresholds by region needs to be created.
A region is defined by a geographical bounding box ([min_lat, max_lat], [min_lon, max_lon]).
A region may also be subdivided by a vertical depth range ([min_dep, max_dep]).
A threshold is then created based on both geographical and vertical classifications.

:: Add dataset quality control dependency: This is another level of quality control that requires checking the flags of interdependent datasets and applying an attribute to reject or accept a dataset. This is mostly for the GTS application (see issue #391).

@leilabbb
Copy link
Contributor

About the long-term goal:

Engage a scientific working group, possibly under UG2, to investigate the potential for delayed-mode quality control and/or to gain feedback on additional QC considerations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants