Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing release-docs workflow for release 0.22.0 #1649

Closed
gbolmier opened this issue Nov 24, 2024 · 15 comments
Closed

Failing release-docs workflow for release 0.22.0 #1649

gbolmier opened this issue Nov 24, 2024 · 15 comments

Comments

@gbolmier
Copy link
Member

I've been trying to make the new 0.22.0 release today but I've been facing some issues. All the workflows are now running properly except the release-docs one.

The docs environment installs spaCy for the sentence-classification nb example, unfortunately spaCy does not support python 3.13 yet (see explosion/spaCy#13658), apparently because of its dependency to srsly (see explosion/srsly#112).

I tried temporarily downgrading the python version in the docs workflows (dev-docs and release-docs) but the release-docs one still fails (see run).

I may not have the time to investigate more tomorrow so if someone want to give it a try by then, please do.
Cc @AdilZouitine, @MaxHalford, @smastelini, @agriyakhetarpal

@MaxHalford
Copy link
Member

Hey @gbolmier! Thanks for taking a look. Honestly I would say it's ok to drop that last part of the notebook that uses SpaCy. It's not a big deal. We can always revisit and add it back later.

@gbolmier gbolmier reopened this Nov 24, 2024
@gbolmier
Copy link
Member Author

In the end, didn't have to drop the spaCy part, incrementing poetry cache key in install-env action fixed it. The release should be out in 3-4 hours (time for the PyPI job to finish)

@MaxHalford
Copy link
Member

Great job @gbolmier, thank you very much 🙏

@agriyakhetarpal
Copy link
Contributor

Thank you very much indeed, @gbolmier! Great to see v0.22.0 tagged :D

@gbolmier
Copy link
Member Author

Thanks guys! Got some further issues with the pypi workflow but hopefully the current run is the right one 🤞

@agriyakhetarpal
Copy link
Contributor

I'd recommend splitting off the QEMU-based builds into a separate job or add a matrix, that might be faster for future iterations ;) Also, setting CIBW_BUILD_VERBOSITY to 2 or changing the build frontend to pypa/build—which is going to become the canonical pathway in future—would display the build steps a bit better.

@agriyakhetarpal
Copy link
Contributor

Another thing: I notice that the wheel builds are running for multiple images under the same operating system: while I understand the inclusion of the Windows 2019/2022 and macOS 13/14 GHA images as separate builds because of different compiler toolchains (MSVC/Xcode) installed across the images (which might be slightly redundant too, though) – the trio of Ubuntu 20.04/22.04/24.04 is most likely redundant because cibuildwheel is going to spin up a manylinux (2014/higher) Docker container to build the wheels in anyway, which will be the same across the images, so it could be useful to keep just Ubuntu 24.04. Please feel free to decline these suggestions in case y'all feel they are unsolicited in any way :)

@gbolmier
Copy link
Member Author

gbolmier commented Nov 25, 2024

Great suggestions @agriyakhetarpal, thanks for sharing! I'm actually not used to work on this kind of stuff so any feedback is quite valuable to me.

I'd recommend splitting off the QEMU-based builds into a separate job or add a matrix, that might be faster for future iterations ;)

Our workflow implementation uses a matrix to parallelize the builds by os. QEMU is used for the ubuntu builds. The current bottleneck is coming from the musllinux_aarch64 builds which take around an hour each to run (times 4 python versions, sequentially). I'm thinking it could make sense to add a python-version variable to the matrix strategy so that the long builds could be run in parallel. Any better idea?

Also, setting CIBW_BUILD_VERBOSITY to 2 or changing the build frontend to pypa/build—which is going to become the canonical pathway in future—would display the build steps a bit better.

+1 for switching to pypa/build.

the trio of Ubuntu 20.04/22.04/24.04 is most likely redundant because cibuildwheel is going to spin up a manylinux (2014/higher) Docker container to build the wheels in anyway, which will be the same across the images, so it could be useful to keep just Ubuntu 24.04.

Oh, good catch! If the wheels are the same then yeah, let's only keep ubuntu-latest (will be 24-04 in a few days, see actions/runner-images#10636)?

Hopefully the current workflow run finishes successfully in an hour and I can open a PR for the next time. Does that sound good @MaxHalford?

@agriyakhetarpal
Copy link
Contributor

Great suggestions @agriyakhetarpal, thanks for sharing! I'm actually not used to work on this kind of stuff so any feedback is quite valuable to me.

I'm glad you like it! I'm quite used to doing things like this, so happy to look around and offer more!

Oh, good catch! If the wheels are the same then yeah, let's only keep ubuntu-latest (will be 24-04 in a few days, see actions/runner-images#10636)?

Sounds good to me, and yes, removing the explicit pin and using ubuntu-latest would also be fine, because the manylinux images would carry their own RHEL-based toolset.

I'm thinking it could make sense to add a python-version variable to the matrix strategy so that the long builds could be run in parallel. Any better idea?

How about we parallelise at a higher granularity, i.e., both at the Python version level and at the glibc/musl level (for Linux)? We can do this with a python-version variable to the matrix strategy, and also include the architecture being built, using the CIBW_BUILD environment variable. We can have twelve jobs in a workflow running simultaneously, so, if we split it like this: we get Python 3.10–3.13 x (glibc and musl) on Linux = 4 x 2 = 8 jobs; we can keep Windows i686 and x86_64 in the same job as-is (it's already fast enough – it took around ten minutes for eight wheels, and we have two jobs for Windows 2019 and 2022); and macOS as-is too (two jobs for macOS 13 and 14); that gives us a total of 8 + 2 + 2 = 12 jobs as needed. Even if we get eight runners assigned at once instead of twelve because of a different PR/merge happening, it would still be a net benefit for the Linux jobs in comparison to parallelising just at the Python version level.

Hopefully the current workflow run finishes successfully in an hour and I can open a PR for the next time

Happy to review that PR or more if you'd like me to!

P.S. cibuildwheel now sets the minimum macOS deployment target as 10.13 by default (bumped from the previous 10.9) since a few releases now, so it is not needed to set it explicitly during wheel repair. :D

@MaxHalford
Copy link
Member

Hopefully the current workflow run finishes successfully in an hour and I can open a PR for the next time. Does that sound good @MaxHalford?

Yep sounds good. Thanks for shedding some light on this topic @agriyakhetarpal, it's not an obvious topic!

@gbolmier
Copy link
Member Author

The release is finally out 🥳 Thanks a lot for your help @agriyakhetarpal!

I've actually already implemented and pushed most of the changes we were talking about (the 12 jobs split, and only ubuntu-latest for linux builds), see last commits or the pypi.yml last version directly. Now the workflow runs in just an hour.

I removed the CIBW_BUILD_VERBOSITY: 2 flag because it was too much logs for the gh action to display (see example run).

@agriyakhetarpal
Copy link
Contributor

Thank you! 🎉 Good to see the split – indeed, 56 minutes in total for all the jobs included! Yes, CIBW_BUILD_VERBOSITY is probably a bit extra – pip wheel hides the logs on successful runs, so we would see the logs anyway in case a build fails, but not when the build passes. If we switch to pypa/build sometime later on, this setting would be ignored because it displays verbose output (equal to verbosity 1) by default.

Would y'all be interested in adding an out-of-tree Emscripten/Pyodide CI job for river? It would help us a lot when we try to update river's version again in Pyodide in-tree. 🙏🏻

@gbolmier
Copy link
Member Author

Would y'all be interested in adding an out-of-tree Emscripten/Pyodide CI job for river? It would help us a lot when we try to update river's version again in Pyodide in-tree. 🙏🏻

Yes, @MaxHalford took the initiative a year ago (see #1290). The last entry of our making a release instructions is:

  1. Pyodide needs to be told there is a new release. This can done by updating packages/river in online-ml/pyodide

But it looks like only River 0.19.0 was built and shared. @agriyakhetarpal please let us know what you think is the best way to manage Pyodide wheels.

If I understand your suggestion correctly, would adding something similar to the following job to our existing workflow be enough and better? (not sure for the python versions, may require a matrix)

Snippet
  build_pyodide_wheels:
    name: Build Pyodide wheels
    needs: [build_macos_wheels]
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Set up rust
        uses: dtolnay/rust-toolchain@stable
        with:
          toolchain: nightly
      - name: Build wheels
        uses: pypa/[email protected]
        timeout-minutes: 720
        env:
          CIBW_BUILD: "cp310-* cp311-* cp312-* cp313-*"
          CIBW_PLATFORM: pyodide
          CIBW_ENVIRONMENT: 'PATH="$HOME/.cargo/bin:$PATH"'
          CIBW_ENVIRONMENT_LINUX: 'PATH="$HOME/.cargo/bin:$PATH" CARGO_NET_GIT_FETCH_WITH_CLI="true"'
          CIBW_BEFORE_BUILD: >
            rustup default nightly &&
            rustup show
          CIBW_BEFORE_BUILD_LINUX: >
            curl https://sh.rustup.rs -sSf | sh -s -- --default-toolchain=nightly --profile=minimal -y &&
            rustup show
      - uses: actions/upload-artifact@v4
        with:
          name: artifact-pyodide
          path: ./wheelhouse/*.whl

@agriyakhetarpal
Copy link
Contributor

That job should work perfectly for building WASM wheels, @gbolmier, but it won't test them – you should also add the CIBW_TEST_* environment variables and skip/fix the tests that fail under a WASM environment (for example, threads or multiprocessing-related functionality would not work for the most part). For the Python versions, Pyodide sets up its own Python interpreter for testing and cibuildwheel takes care of setting up the Pyodide build system/frontend (pyodide-build) and the Emscripten toolchain automatically, so you won't need to add setup-python either. We support one Python version (currently 3.12) per Pyodide version.

For greater flexibility, I'd also recommend looking at the previous iteration of this CI job for NumPy – which would be numpy/numpy#25894. cibuildwheel ties the Pyodide version available to its own version, so there is a chance it might not be the latest version and could become out of date until we update it there.

In-tree, we usually (try to) upgrade all packages periodically when we update the Emscripten version (which is when we have an ABI break). There is a chance river might have been broken when we did this chore the last time. Now that river has had a new release, I've upgraded it while working on pyodide/pyodide#4925.

@gbolmier
Copy link
Member Author

Thanks for all the info! I'll try to find the time to open a PR tomorrow evening or in the next days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants