Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate tests to GitHub Actions #3430

Draft
wants to merge 26 commits into
base: master
Choose a base branch
from

Conversation

mwiencek
Copy link
Member

Problem

  • Our Selenium tests (on Jenkins) are slow. They used to take up to an hour, though since moving to a newer server, they take under 30 minutes.
  • Spreading our test workflows across different services (CircleCI and Jenkins) isn't ideal, but CircleCI doesn't seem to have enough or as much free resources as GitHub Actions to run our Selenium tests there.
  • All of our Jenkins configuration is stored inside Jenkins as opposed to version control. We never looked into converting things to use Jenkinsfiles/pipelines.
  • Jenkins requires additional maintenance (of our repository, containers, upgrades to plugins and Jenkins itself).
  • Having to manually build and push new musicbrainz-tests image versions whenever changes are made to Dockerfile.tests is annoying.
  • Our JavaScript code coverage reports are incomplete: they only include coverage from the Selenium tests.

Solution

  • Splits the Selenium tests into four parallel jobs which can be re-run individually.
  • Moves all of our tests away from CircleCI/Jenkins and onto GitHub Actions.
  • Combines the JS, Perl, pgTAP, and Selenium tests into a single CI workflow stored in git (.github/workflows/ci.yml).
  • Allows us to move all other jobs running on Jenkins to GitHub Actions in the future, so that we can retire Jenkins for good.
  • Builds the tests image as part of the CI workflow (we no longer have to manually build and push tests images).
  • Extracts JS coverage from our t/web.js and Perl tests, and merges it with the coverage from our Selenium tests.

Other changes:

  • Converts Dockerfile.tests to a multi-stage build.

One downside I'll note is that we can no longer SSH into running jobs (like on CircleCI) by default. There might be a way to emulate this feature if we need it.

Testing

Just the new GitHub Actions workflow!

Recent Selenium builds have been failing with

  WebDriverError: disconnected: not connected to DevTools
    (failed to check if window was closed: disconnected: not connected to DevTools)

even though we haven't changed anything that I'm aware of. Trying to update our
Chrome dependencies here to see if that helps in any way.

Also bump selenium-webdriver to 4.27.0; I've combined it with this commit
because the changelog suggests it's somewhat tied to specific Chrome versions:
https://github.com/SeleniumHQ/selenium/blob/trunk/javascript/node/selenium-webdriver/CHANGES.md
This started failing during CI runs recently, and I have no clue why. (If you
run the tests manually in your browser, it will also fail if you click on the
page before it executes.)

I'm just removing the test because the old autocomplete code is on the way out
anyway.
We'd like to move Selenium tests away from Jenkins (which we moved to in
machine with heavy MBS traffic, which can cause test slowdowns and flakiness.
Maintaining Jenkins also isn't free and requires us to store some CI
configuration outside of git.

GitHub Actions looks like a suitable alternative for us, as its default runners
provide more vCPUs and RAM (4 + 16GB) than CircleCI's (2 + 4GB), while also
providing unlimited build minutes. Since we already use GitHub, that's one less
service for us to rely on, and GHA integrates better with GitHub.

The main disadvantage I could find is that there's no built-in way to SSH into
a running tests container on GHA.
See previous commit, "Migrate from CircleCI to GitHub Actions."
It's now symlinked from the GitHub Actions workspace.
@mwiencek
Copy link
Member Author

This was (finally) passing at https://github.com/metabrainz/musicbrainz-server/actions/runs/12457761240 but there are probably (definitely) some issues with running them as part of a pull request that I'll need to address. 😛

@mwiencek mwiencek marked this pull request as draft December 22, 2024 22:28
HACKING-PROD.md Outdated
`/home/musicbrainz/musicbrainz-server/`, and then you can run any test you want to check like this:

$ sudo -E -H -u musicbrainz carton exec -- prove -lv t/tests.t :: --tests Failing::Test
Finally, you will need to update the `musicbrainz-tests` image version in
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Reminder to myself that this documentation needs to be updated again.)

The metabrainz/musicbrainz-tests image will be built and cached prior to the
two test jobs running. It no longer need to be built and pushed by hand in
advance.
This silences progress bar log spam in the CI runs.
Copy the git checkout, create the test database, and compile static resources
within the Dockerfile rather than as part of the workflow job steps.

This will speed up each job by reducing the number of redundant steps.
This will be used in a subsequent commit to merge the coverage data with that
from our Selenium tests
Up until this point, a downside of our JavaScript coverage reports was that
they only included the Selenium tests. It'd be very helpful if we could also
include coverage from our t/web.js test runs. This merges them together in a
separate GitHub Actions job, using artifacts to pass the istanbul coverage data
between jobs.
So it can be imported in files executed via bin/sucrase-node.
These take a long time to complete, and that time will only increase as we add
more and more browser tests.

Add a way to run partitions/subsets of the tests in separate jobs. We use
greedy number partitioning [1] to partition the tests using their number of
commands as their size.

[1] https://en.wikipedia.org/wiki/Greedy_number_partitioning
Since there appears to be an actual *documented* way to cleanup images produced
by our tests via the GHCR API, use that instead.
Many Perl tests make requests to pages rendered by the Node template-renderer.
Prior to this commit, we weren't dumping istanbul coverage data from those
tests. (The coverage is only dumped when the template-renderer process is
cleanly shut down.)
There's no need to leave the Flow server running once we're done with it.
This comment wasn't entirely correct, as generating cpanfile.snapshot had
nothing to do with Chrome updating. That was a side effect of rebuilding
Dockerfile.tests, not cpanfile.snapshot. And it's no longer an issue since
da2d499 anyway.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant