Implement rolling groupby #460

fjetter · 2023-12-05T11:28:23Z

So, I'm just expanding the existing Rolling because the groupby rolling is actually doing the same thing as the ordinary rolling just with a local groupby. The implementation feels very unclean. I'll review the class hierarchy again if there is a more elegant way to modify the logic here. Any pointers appreciated

fjetter · 2023-12-05T16:09:20Z

dask_expr/_rolling.py

+            if self.groupby_kwargs is not None:
+                return type(parent)(
+                    type(self)(self.frame[columns], *self.operands[1:]),
+                    *parent.operands[1:],
+                )


I'm honestly not even sure what this is doing and it does feel like we're duplicating work but this is what I found for the groupby implementations and this seems to work.

This drops all columns except the parent columns and the by columns from the df before the groupby and then drops the by column after the groupby if not in parent.columns

Yes this is needed for groupby to avoid dropping too many columns

dask_expr/tests/test_rolling.py

phofl · 2023-12-06T08:59:05Z

dask_expr/tests/test_groupby.py

+
+    actual = ddf.groupby("group1").rolling("1D").sum()["column1"]
+    expected = df.groupby("group1").rolling("1D").sum()["column1"]
+    actual.optimize()


Ideally, we would test this against

df[["column1", "group1"]].groupby("group1").rolling("1D").sum()["column1"].optimize()._name

phofl · 2023-12-06T10:34:45Z

thx

fjetter commented Dec 5, 2023

View reviewed changes

phofl reviewed Dec 6, 2023

View reviewed changes

dask_expr/tests/test_rolling.py Outdated Show resolved Hide resolved

phofl reviewed Dec 6, 2023

View reviewed changes

fjetter force-pushed the rolling_groupby branch 2 times, most recently from 69c0651 to ed090dc Compare December 6, 2023 09:54

Implement rolling groupby

7b8a654

fjetter force-pushed the rolling_groupby branch from ed090dc to 7b8a654 Compare December 6, 2023 09:54

phofl approved these changes Dec 6, 2023

View reviewed changes

phofl merged commit 44ccf69 into dask:main Dec 6, 2023
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement rolling groupby #460

Implement rolling groupby #460

fjetter commented Dec 5, 2023

fjetter Dec 5, 2023

phofl Dec 6, 2023

phofl Dec 6, 2023

fjetter Dec 6, 2023

phofl commented Dec 6, 2023

Implement rolling groupby #460

Implement rolling groupby #460

Conversation

fjetter commented Dec 5, 2023

fjetter Dec 5, 2023

Choose a reason for hiding this comment

phofl Dec 6, 2023

Choose a reason for hiding this comment

phofl Dec 6, 2023

Choose a reason for hiding this comment

fjetter Dec 6, 2023

Choose a reason for hiding this comment

phofl commented Dec 6, 2023