Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polars product inconsistent with pandas product #424

Open
AdrianSosic opened this issue Nov 9, 2024 · 4 comments
Open

Polars product inconsistent with pandas product #424

AdrianSosic opened this issue Nov 9, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@AdrianSosic
Copy link
Collaborator

The order of the Cartesian product elements is different when created via polars or pandas. That is, depending on BAYBE_DEACTIVATE_POLARS , the following code gives two different outputs:

from baybe.parameters.numerical import NumericalDiscreteParameter
from baybe.searchspace.discrete import SubspaceDiscrete

s = SubspaceDiscrete.from_product(
    [
        NumericalDiscreteParameter("a", [0, 1]),
        NumericalDiscreteParameter("b", [3, 4, 5]),
    ]
)
print(s.exp_rep)

Pandas:

     a    b
0  0.0  3.0
1  0.0  4.0
2  0.0  5.0
3  1.0  3.0
4  1.0  4.0
5  1.0  5.0

Polars:

0  0.0  3.0
1  1.0  3.0
2  0.0  4.0
3  1.0  4.0
4  0.0  5.0
5  1.0  5.0

Not a "bug" in the strict sense, but still annoying and can cause troubles in testing (this is how I found it) or when people rely on the actual ordering and then toggle the flag.

@AdrianSosic AdrianSosic added the bug Something isn't working label Nov 9, 2024
AdrianSosic added a commit that referenced this issue Nov 9, 2024
@Scienfitz
Copy link
Collaborator

I don't think this should be labelled a bug, the order is inherent to the used method + order of parameters apssed

Unless there is an inconsistent handling of the latter from our side, this might not even be fixable from our end

In the provided example it seems like polars treats the parameters starting from the last while pandas starts from the first. If this is true we might be able to simply exploit this and reverse the order of parameters passed to the polars method

@AVHopp
Copy link
Collaborator

AVHopp commented Nov 11, 2024

Agree, this should be investigated in a bit more detail. But can you elaborate a little bit how and why this caused issues in testing?

@AdrianSosic
Copy link
Collaborator Author

I had a deeper look and noticed that the problem is not because of the Cartesian part (that works just like the pandas counterpart) but due to the streaming=True flag in the collect call, which seems to destroy the order. Now this could be either because:

  • this is a "feature", i.e. ordering is no longer guaranteed in streaming mode or
  • because the streaming feature is not yet stable in polars (which they say in the docs).

What do we make out of this?

@Scienfitz
Copy link
Collaborator

could there be a reorder step in which the fitlered polars resulting df is ordered according to the parameter-parameter-values order (ie the one Cartesian implies) ? Just .sort_values ?
other than that I dont see a fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants