Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: state cleaning based on watermark #18728

Closed
hzxa21 opened this issue Sep 26, 2024 · 6 comments
Closed

Discussion: state cleaning based on watermark #18728

hzxa21 opened this issue Sep 26, 2024 · 6 comments

Comments

@hzxa21
Copy link
Collaborator

hzxa21 commented Sep 26, 2024

The watermark mechanism ensures that no further upstream messages will be sent with column payload <= watermark in the watermark column. That means if the watermark column is part of the state table PK, we can safely clean the states whose column payload <= downstream watermark in the watermark column. There are two cases to consider here:

  1. The watermark column is the PK prefix (| watermark_col | other_pk_col1 | other_pk_col2 | ...).
  2. The watermark column is not the PK prefix (| other_pk_col1 | watermark_col | ...).

Currently we support 1 via hummock table vnode watermark but 2 is not supported so we rely on streaming to ensure that the watermark column is the PK prefix to make state cleaning effective.

However, in the following cases (cc @stdrc @st1page @yuhao-su to comfirm), streaming is not possible to put watermark column as the PK prefix but technically state cleaning should be performed:

This issue is created to capture and discuss ideas to perform state cleaning for case 2 (| other_pk_col1 | watermark_col | ...).

@github-actions github-actions bot added this to the release-2.1 milestone Sep 26, 2024
@hzxa21
Copy link
Collaborator Author

hzxa21 commented Sep 26, 2024

Ideas:

  1. Implement a mechanism to periodically run delete * from __internal_xxx where watermark_col < x on the state table.

    • Pros:
      • Non-intrusive to executor and storage
    • Cons:
      • Scan the whole table periodically may waste resources and have some overheads
      • Can generate many point delete tombstones in storage, which may affect storage read performance and increase compaction load.
  2. Executor maintains an index for the watermark column. Before it emits downstream watermark, the executor performs a scan on the
    index and issue point deletes to its state table for state cleaning. The implementation is similar to maintain an internal dynamic filter.

    • Pros:
      • Non-intrusive to storage
      • Easy to implement
    • Cons:
      • Happen in the critical path of streaming, which may affect freshness
      • Can generate many point delete tombstones in storage, which may affect storage read performance and increase compaction load.
  3. Implement a mechanism in compaction to be aware of watermark and clean the entries below watermark asynchronously during compaction.

    • Pros:
      • Non-intrusive to executor
      • Minimize overhead on read because no point delete tombstones will be generated
      • Less waste on resources because it happens along with the regular compaction
    • Cons:
      • Since the state cleaning is asynchronous, executor needs to make sure data below watermark must not be touched or executor needs to filter data on its own.
      • Compaction needs to be more aware of the table schema and needs to do key deserialization
      • May complicate compaction strategy if the task generation needs to consider state clean watermark, in addition to the LSM tree balanceness.

Personally prefer 3 > 1 > 2

@st1page
Copy link
Contributor

st1page commented Sep 26, 2024

related discussion for interval join https://github.com/risingwavelabs/rfcs/pull/32/files#r1059266190

@st1page
Copy link
Contributor

st1page commented Sep 26, 2024

related discussion https://github.com/risingwavelabs/rfcs/pull/32/files#r1059266190

And I think the 2nd method is exactly what Flink has been done

@st1page
Copy link
Contributor

st1page commented Sep 26, 2024

Since the state cleaning is asynchronous, executor needs to make sure data below watermark must not be touched or executor needs to filter data on its own.

That is feasible

@yuhao-su
Copy link
Contributor

asof join #18503

@fuyufjh
Copy link
Member

fuyufjh commented Oct 17, 2024

Let's use #18802 afterwards

@fuyufjh fuyufjh closed this as completed Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants