Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement rolling groupby #460

Merged
merged 1 commit into from
Dec 6, 2023
Merged

Implement rolling groupby #460

merged 1 commit into from
Dec 6, 2023

Conversation

fjetter
Copy link
Member

@fjetter fjetter commented Dec 5, 2023

So, I'm just expanding the existing Rolling because the groupby rolling is actually doing the same thing as the ordinary rolling just with a local groupby. The implementation feels very unclean. I'll review the class hierarchy again if there is a more elegant way to modify the logic here. Any pointers appreciated

Comment on lines +82 to +86
if self.groupby_kwargs is not None:
return type(parent)(
type(self)(self.frame[columns], *self.operands[1:]),
*parent.operands[1:],
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm honestly not even sure what this is doing and it does feel like we're duplicating work but this is what I found for the groupby implementations and this seems to work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This drops all columns except the parent columns and the by columns from the df before the groupby and then drops the by column after the groupby if not in parent.columns

Yes this is needed for groupby to avoid dropping too many columns


actual = ddf.groupby("group1").rolling("1D").sum()["column1"]
expected = df.groupby("group1").rolling("1D").sum()["column1"]
actual.optimize()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we would test this against

df[["column1", "group1"]].groupby("group1").rolling("1D").sum()["column1"].optimize()._name

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@fjetter fjetter force-pushed the rolling_groupby branch 2 times, most recently from 69c0651 to ed090dc Compare December 6, 2023 09:54
@phofl phofl merged commit 44ccf69 into dask:main Dec 6, 2023
7 checks passed
@phofl
Copy link
Collaborator

phofl commented Dec 6, 2023

thx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants