Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AutoGluon v1.2 fails #661

Open
PGijsbers opened this issue Nov 29, 2024 · 10 comments
Open

AutoGluon v1.2 fails #661

PGijsbers opened this issue Nov 29, 2024 · 10 comments
Assignees
Labels
bug Something isn't working framework For issues with frameworks in the current benchmark

Comments

@PGijsbers
Copy link
Collaborator

When AutoGluon resorts to pickle files to transfer data between benchmark and framework process, the benchmark fails because there is an incompatible numpy version installed in the framework's virtual environment.

The basic autogluon tests on #656 succeeded, but failed on #657. The latter obviously should have no effect on the functioning of the integration, but now there is a No module named numpy._core error message. However, from the logs we see that the former used AutoGluon 1.1.1, whereas the latter has AutoGluon 1.2.

@PGijsbers PGijsbers added bug Something isn't working framework For issues with frameworks in the current benchmark labels Nov 29, 2024
@PGijsbers
Copy link
Collaborator Author

PGijsbers commented Dec 11, 2024

It looks like AutoGluon ends up installing a more modern version of numpy (2.0.2 ). Requirements are violated during the installation of AG which results in this error.

@PGijsbers
Copy link
Collaborator Author

I think the long-term solution is probably to avoid using pickles and instead use parquet for everything (except we have to figure out what to do with sparse data).

@deadsoul44
Copy link

deadsoul44 commented Dec 13, 2024

I was able to run a task with AutoGluon 1.2 on 2024-11-30. But I get this error now. This is probably due to the new release of numpy. Numpy 2.1.3 should be fine, numpy 2.2.0 is the problem probably.

@deadsoul44
Copy link

Tried numpy 2.0.0, still fails with the same message.

@Innixma
Copy link
Collaborator

Innixma commented Dec 13, 2024

Thanks for highlighting! This is actually also a bug in AutoGluon to an extent, because CatBoost doesn't support numpy 2.x yet. The problem is that we are installing scikit-learn-intelex as a separate install step in setup.sh: https://github.com/openml/automlbenchmark/blob/master/frameworks/AutoGluon/setup.sh#L49

Because the latest skex uses numpy 2, it force installs numpy 2, which breaks CatBoost. So this would harm AutoGluon's predictive quality quite a bit. It might have even harmed my latest benchmarks, I didn't look closely because we only added numpy 2.0 support very close to the release. To fix this we would want to include the skex install in the same line as the rest of the install, rather than doing it after. This would then align with the user install experience of skex.

Once I'm back from NeurIPS I'll try to take a look at this and see if I can fix.
Note what I described above won't fix the inherent incompatibility of AMLB with numpy 2.0 pickle files, resolving that is a separate issue, that might be unrelated to AutoGluon itself but rather how AMLB transfers files from frameworks to the core AMLB environment.

@Innixma
Copy link
Collaborator

Innixma commented Dec 13, 2024

@PGijsbers Short term fix would be changing the skex install line from:

$UV pip install tabular/[skex]

to

$UV pip install tabular/[all,skex]

Not ideal, but should work.

@Innixma
Copy link
Collaborator

Innixma commented Dec 13, 2024

Feel free to try it out and merge if it appears to work

@PGijsbers
Copy link
Collaborator Author

Thanks for the message, will try tomorrow.

Re:

that might be unrelated to AutoGluon itself but rather how AMLB transfers files from frameworks to the core AMLB environment.

That's what I was alluding to with

I think the long-term solution is probably to avoid using pickles and instead use parquet for everything (except we have to figure out what to do with sparse data).

it wasn't meant to apply only specifically for AG, but for all frameworks in general. Relying on pickle is fickle (heh).

@PGijsbers
Copy link
Collaborator Author

PGijsbers commented Dec 16, 2024

Seems to work for --platform=linux/amd64 but not --platform=linux/arm64 (default on Apple Silicone). For ARM it fails to resolve the dependencies so the installation fails.

@Innixma
Copy link
Collaborator

Innixma commented Dec 16, 2024

Ah... Can you try adding logic to not do the skex install if it is an ARM system? I believe skex isn't supported on ARM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working framework For issues with frameworks in the current benchmark
Projects
None yet
Development

No branches or pull requests

3 participants