bug: Redpanda source `startup mode = latest` doesn't work in batch query #12361

fuyufjh · 2023-09-17T03:28:34Z

Describe the bug

I configured a data source by connecting it to the Redpanda topic with the 'startup mode' set as 'latest.' However, I encountered an issue when querying the data. Despite having three days of data in my topic, the queries from the source consistently return data from the earliest records, not the latest ones. I'm puzzled about the purpose of specifying 'latest' as the startup mode in this scenario.

Error message/log

No response

To Reproduce

No response

Expected behavior

No response

How did you deploy RisingWave?

No response

The version of RisingWave

No response

Additional context

Feedback from users.

ZENOTME · 2023-09-17T07:25:24Z

I think we don't support the semantics of latest in query batch source.

If my understand is correct, in streaming, latest means that we can see the data after materized veiw is created. E.g.
data: 1 |create source| data: 2 | create materized view | data: 3
The materized view only can see the data 3. Because we fetch the partition offset when we actually create the materized view.

According to #6725, in batch, we fetch the partition offset every time the batch query comes in. We can't directly apply the latest in streaming because that will cause get empty data every time. So to support latest, I think we need to define latest semantics in batch source first.

E.g. latest in batch source query means that we can only see the data after create source, in above example, which means that we can see data 2 and data 3.
data: 1 |create source| data: 2 | create materized view | data: 3
And then to support above semantics, maybe we should store the partition offset when we create the source.

cc @fuyufjh @tabVersion @liurenjie1024

liurenjie1024 · 2023-09-18T02:17:01Z

It's by design. latest is meaningless in batch query. User is supposed to use _rw_kafka_timestamp to filter out messages: https://docs.risingwave.com/docs/current/create-source-kafka/#query-kafka-timestamp

fuyufjh added type/bug Something isn't working priority/high labels Sep 17, 2023

fuyufjh assigned ZENOTME Sep 17, 2023

github-actions bot added this to the release-1.3 milestone Sep 17, 2023

liurenjie1024 closed this as completed Sep 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: Redpanda source `startup mode = latest` doesn't work in batch query #12361

bug: Redpanda source `startup mode = latest` doesn't work in batch query #12361

fuyufjh commented Sep 17, 2023 •

edited

Loading

ZENOTME commented Sep 17, 2023

liurenjie1024 commented Sep 18, 2023

bug: Redpanda source startup mode = latest doesn't work in batch query #12361

bug: Redpanda source startup mode = latest doesn't work in batch query #12361

Comments

fuyufjh commented Sep 17, 2023 • edited Loading

Describe the bug

Error message/log

To Reproduce

Expected behavior

How did you deploy RisingWave?

The version of RisingWave

Additional context

ZENOTME commented Sep 17, 2023

liurenjie1024 commented Sep 18, 2023

bug: Redpanda source `startup mode = latest` doesn't work in batch query #12361

bug: Redpanda source `startup mode = latest` doesn't work in batch query #12361

fuyufjh commented Sep 17, 2023 •

edited

Loading