Rule acceptance tests

Goal

Because GTFS data consumers and producers rely on the validator it is important to know if a pull request introduces a breaking change (i.e. the proposed validator declares existing valid datasets invalid). If this step is skipped, newly declared invalid datasets could be rejected by GTFS data consumers (e.g. Transit App, Google Maps) which could lead to public transit systems disappearing from their interface which means that riders would no longer be able to access the trip information they are used to getting on these platforms.

Definitions

The reference validator is defined as the latest version of the validator available on the (master branch) of this repository.
The proposed validator is defined as the version of the validator that results from the changes introduced in the pull request that is proposed.
The acceptance criteria (mentioned in the diagram below) is defined as the impact that a pull request has on datasets: does this pull request disrupt a large quantities of datasets? If yes, the pull request should be flagged as introducing breaking changes or rejected, if no then the pull request can be safely merged to the master branch.

Process description

For the latest version of all GTFS datasets from the MobilityDatabase, the validation report from both the proposed and the reference validator are compared. An acceptance test report is generated: it quantifies for each agency/dataset the number of new notice (as defined here) that have been introduced.

Github Actions

The logic for this process is defined in acceptance_test.yml.

This workflow:

packages the output-comparator module;
packages the proposed version of the validator;
downloads the version of the reference validator that is on the master branch;
defines a matrix of urls (fetched from the MobilityDatabase) that will be used in the further validation process;

On each of these urls:

the reference version of the validator is executed and the validation report is output as JSON (under reference.json);
the proposed version of the validator is executed and the validation report is output as JSON (under latest.json).

At the end of execution of the two aforementioned steps for every url in the matrix, all the validation reports are gathered in a single folder (output) and compared - the percentage of newly invalid datasets is output to the console. The final acceptance test report is output at acceptance_report.json. It includes a summary of both new notice types and dropped notice types. It also contains a list of "corrupted" sources: sources that could not be taken into account while generating the acceptance test report because of I/O errors, or missing file.

To finish with, a comment that sums up the acceptance test result is issued on the PR.

Example output:

acceptance_report.json

{
  "newErrors": [
    {
      "noticeCode": "first_notice_code",
      "affectedSourcesCount": 2,
      "affectedSources": [
        {
          "sourceId": "source-id-1",
          "sourceUrl": "url to the latest version of the dataset issued by source-id-1",
          "noticeCount": 4
        },
        {
          "sourceId": "source-id-2",
          "sourceUrl": "url to the latest version of the dataset issued by source-id-2",
          "noticeCount": 6
        }
      ]
    },
    {
      "noticeCode": "second_notice_code",
      "affectedSourcesCount": 1,
      "affectedSources": [
        {
          "sourceId": "source-id-5",
          "sourceUrl": "url to the latest version of the dataset issued by source-id-5",
          "noticeCount": 5
        }
      ]
    },
  ],
  "droppedErrors": [
    # Same schema as `newErrors`
  "newWarnings": [
    # Same schema as `newErrors`
  "droppedWarnings": [
    # Same schema as `newErrors`
  "newInfo": [
    # Same schema as `newErrors`
  "droppedInfo": [
    # Same schema as `newErrors`
  ],
  "corruptedSources": {
    "corruptedSources": [
      "source-id-1",
      "source-id-2"
    ],
    "sourceIdCount": 1245,
    "aboveThreshold": false,
    "corruptedSourcesCount": 2,
    "maxPercentageCorruptedSources": 2
  }
}

Where each source id value come from the MobilityDatabase: they are a unique property used to identify each source of data.

The source id can be used to find all datasets versions of a source on the MobilityDatabase for the sakes of debugging or exploration.

What do we do with the results?

We follow this process:

Performance metrics within the acceptance reports

There are two main metrics added to the acceptance report comment at the PR level, Validation Time and Memory Consumption. The performance metrics are not a blocker as performance might vary due to external factors including GitHub infrastructure performance. However, large jumps in performance values should be investigated before approving a PR.

Validation Time

The validation time consists in general metrics like average, median, standard deviation, minimums and maximums. This metrics can be affected by addition of new validators than introduce a penalty in processing time.

Memory Consumption

There are two main patterns on how to take a memory usage snapshot:

MemoryMonitor annotation: This annotation persists the memory usage in the target method. As a limitation, for methods that have concurrent thread executions, the annotation persists in multiple snapshots. This cannot be very clear when analyzing memory usage.
MemoryUsageRegister: using the registry directly give you more flexibility than the annotation and can be used in cases where MemoryMonitor produces multiple entries on concurrent executed methods.

The memory consumption section contains three tables.

The first, list the first 25 datasets that the difference increased memory comparing with the main branch.
The second, list the first 25 datasets that the difference decreased memory comparing with the main branch.
The third, list(not always visible) the first 25 datasets that were not available for comparison as the main branch didn't contain the memory usage information.

Memory usage is collected in critical points and persists in the JSON report. The added snapshot points are:

GtfsFeedLoader.loadTables: This is taken after the validator loads all files.
GtfsFeedLoader.executeMultiFileValidators: This is taken after the validator executed all multi-file validators
org.mobilitydata.gtfsvalidator.table.GtfsFeedLoader.loadAndValidate: This is taken for the complete load and validation method.
ValidationRunner.run: This is taken for the complete run of the validator, excluding report generation

Instructions to run the pipeline

Provide code changes by creating a new PR on the GitHub repository;
The acceptance test pipeline will run each time code is pushed on the newly created branch; except if the keyword [acceptance test skip] is included in the commit message.

Instructions to verify the execution of the pipeline

Download all validation reports from the artifact listed for the specific GitHub run;
One can verify that the count of validation report (1 per source) matches the number of sources announced by the GitHub PR comment
Select a sample of validation reports and compare them manually. MobilityData uses an internal tool to do so. We will open source it in the future.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ACCEPTANCE_TESTS.md

ACCEPTANCE_TESTS.md

Rule acceptance tests

Goal

Definitions

Process description

Github Actions

What do we do with the results?

Performance metrics within the acceptance reports

Validation Time

Memory Consumption

Instructions to run the pipeline

Instructions to verify the execution of the pipeline

Files

ACCEPTANCE_TESTS.md

Latest commit

History

ACCEPTANCE_TESTS.md

File metadata and controls

Rule acceptance tests

Goal

Definitions

Process description

Github Actions

What do we do with the results?

Performance metrics within the acceptance reports

Validation Time

Memory Consumption

Instructions to run the pipeline

Instructions to verify the execution of the pipeline