Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DDD-4 Manipulate Connector Offset via Debezium Signals #12

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ani-sha
Copy link
Member

@ani-sha ani-sha commented Aug 8, 2024

This is the first draft of offset manipulation design document (simplified version of #6), starting with implementation for Debezium Postgres connector.

cc : @jpechane

DDD-4.md Outdated Show resolved Hide resolved
DDD-4.md Outdated

* `name` - The name of the action. This will be `change-offset`.
* `offset-position` - The new offset position to set (which corresponds to `lsn_commit` in the PostgreSQL connector).
* `last-offset-position` - The last offset position to set (which corresponds to `lsn_proc` in the PostgreSQL connector).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to come up with better naming and descrption/explanation, current one is a bit confusing.
Also I think we need an example of the signal message in the DDD too.

Copy link
Member

@Naros Naros Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should tie the change-offset payload to any specific connector implementation.

While there could be some common attributes between them, I believe it's best to take this as an iterative approach where we consider the payload being processed by the signal handler and delegated to each connector with a specific impl that validates & returns the writable payload. Then we can look to refactor that as a second pass, if needed.

So I would propose the signal payload may look:

{
  /* This acts as an envelope where we can add some common attributes in pass 2 */
  "connector-offsets": {
    /* This is the connector-offset payload, to be processed by each connector individually */
    /* Here we would have unique key/value tuples, this being an example of PostgreSQL */
    "last_commit_lsn": "....",
    "last_procsesed_lsn": "...",
    
    /* Here's another example for something like Oracle */
    "last_processed_scn": "123456789",
    "committed_scns": [
      {
        "redo_thread": 1,
        "commit_scn": "123456799",
        "transactions_committed_at_commit_scn": ["trx1", "trx2"]
      },
      {
        "redo_thread": 2,
        "commit_scn": "123456650",
        "transactions_committed_at_commit_scn": ["trx3"]
      }
    ]
  }
}

The envelope could be something handled directly by the main signal handler if needed, but we'd have a connector-specific object that is provided the contents of the "connector-offsets" and it would return the necessary Struct payload based on that data to be written to the Kafka offsets.

In this case, you may not need the OffsetValidator contract to be separate, too. wdyt?

The following refers to the offsets regardless of streaming/snapshot:

```json
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also make a distintion between the fields that are an the offsets and those that are actually necessary for repositioning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants