-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor independent cluster configs, pin pandas<2
in cluster environments
#1120
Conversation
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## main #1120 +/- ##
==========================================
+ Coverage 81.76% 81.87% +0.11%
==========================================
Files 78 78
Lines 4397 4397
Branches 792 792
==========================================
+ Hits 3595 3600 +5
+ Misses 631 622 -9
- Partials 171 175 +4 see 1 file with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
pandas<2
pandas<2
in cluster environments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this!
Once fugue updates the pandas pinning or #1111 lands we should be able to remove the pinning and have pandas 2.0 in both places (hopefully).
Curious if in addition to the refactor here it might to useful to have some utility that checks that the environment/versions of packages on the client, scheduler and workers match before starting the tests. Maybe even upstream dask/distributed could benefit from something like this (if it's easy to do)
From what I recall of some previous runs, this should be something that currently exists in upstream Dask, as I recall there being warnings thrown if mismatched versions of some core packages were detected between the client/scheduler/workers - might be just seeing if there's appetite to configure this to raise an error instead of a warning |
Looks like the latest Dask release has made it so the images now pull in pandas 2.0, which ends up being inconsistent with the client environment for cluster testing (fugue pulls in a
pandas<2
constraint).This applies that constraint to the cluster environments, which should resolve the resulting failures that began popping up; I've also made some modifications to the cluster configuring Docker Compose files so that:
Hoping that having this shared environment file should cut down on repetition and make it easier to document modifications we need to make to the base cluster environment to get things working