-
Notifications
You must be signed in to change notification settings - Fork 595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: state cleaning based on watermark #18728
Comments
Ideas:
Personally prefer 3 > 1 > 2 |
related discussion for interval join https://github.com/risingwavelabs/rfcs/pull/32/files#r1059266190 |
And I think the 2nd method is exactly what Flink has been done |
That is feasible |
asof join #18503 |
Let's use #18802 afterwards |
The watermark mechanism ensures that no further upstream messages will be sent with
column payload <= watermark
in the watermark column. That means if the watermark column is part of the state table PK, we can safely clean the states whosecolumn payload <= downstream watermark
in the watermark column. There are two cases to consider here:| watermark_col | other_pk_col1 | other_pk_col2 | ...
).| other_pk_col1 | watermark_col | ...
).Currently we support 1 via hummock table vnode watermark but 2 is not supported so we rely on streaming to ensure that the watermark column is the PK prefix to make state cleaning effective.
However, in the following cases (cc @stdrc @st1page @yuhao-su to comfirm), streaming is not possible to put watermark column as the PK prefix but technically state cleaning should be performed:
This issue is created to capture and discuss ideas to perform state cleaning for case 2 (
| other_pk_col1 | watermark_col | ...
).The text was updated successfully, but these errors were encountered: