feat(iceberg): support iceberg engine table (in local env) #19577

chenzl25 · 2024-11-26T09:47:10Z

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Tracking: Iceberg engine table #19418
~~Support ./risedev d iceberg-engine to set up a local environment to run the iceberg engine table.~~ Use ./risedev d full directly.
Support create/drop/select iceberg engine table. The ddl in this PR is not atomic, but it should be fine in the first version.
The iceberg catalog is stored in our sql meta backend and the data is stored in S3 compatible cloud storage.
~~This PR will retrieve meta backend connection info and S3 warehouse info from the environment variable for simplicity, but we will improve it by fetching this info from meta in a later PR.~~
Make the iceberg sink s3 access key and secret key optional and add an enable_config_load field to load credentials from the default credential provided chain.
No compaction in this PR.

Example

create table t(id int primary key, name varchar) engine = iceberg;
insert into t values(1, 'xxx');
select * from t;

Checklist

I have written necessary rustdoc comments
I have added necessary unit tests and integration tests
I have added test labels as necessary. See details.
I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
All checks passed in ./risedev check (or alias, ./risedev c)
My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)

My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

src/meta/model/src/table.rs

ci/scripts/e2e-iceberg-engine-test.sh

src/frontend/src/handler/create_table.rs

fuyufjh · 2024-12-03T07:40:35Z

src/frontend/src/handler/create_table.rs

    let catalog_writer = session.catalog_writer()?;
+    // TODO(iceberg): make iceberg engine table creation ddl atomic


Yeah, this is a critical issue... especially if we create tables first, before source and sink. This is because table is self-contained, while create source employs a validation stage to check whether the upstream system really work, so it has a high chance to fail. Shall we create the source/sink before table?

There are some dependencies here. To create an iceberg source, we first need to have an iceberg table. To create an iceberg table we need to create an iceberg sink (with create_table_if_not_exists). To create an iceberg sink we need to create a hummock table first. Finally, we have this order hummock table -> iceberg sink -> iceberg source

fuyufjh · 2024-12-03T07:44:45Z

src/frontend/src/handler/create_table.rs

    let catalog_writer = session.catalog_writer()?;
+    // TODO(iceberg): make iceberg engine table creation ddl atomic
    catalog_writer
        .create_table(source, table, graph, job_type)


Here the source is passed as well, can it work? 🤔

What I am thinking is that, for a common table with connector, the corresponding source is supposed to generate changes that will be applied to the table. But here the Iceberg source is just for batch read, which I think is actually irrelevant/unconnected to the iceberg table internally.

The source here is something like kafka and postgres cdc connector instead of the iceberg source. For example

create table t (a int) with (connector = 'kafka' ...) engine = iceberg.

src/frontend/src/handler/drop_table.rs

xiangjinwu

for Cargo.lock

BugenZhao · 2024-11-29T04:09:46Z

src/frontend/src/handler/create_table.rs

+                meta_store_database.clone()
+            )
+        }
+        MetaBackend::Sqlite | MetaBackend::Sql | MetaBackend::Mem => {


MetaBackend::Sql will be widely adopted since #19560. Shall we support it as well?

Oh, I previously thought it was deprecated. For iceberg jdbc right now, we need to know the underlying database implementation to choose the right driver.

I think we can simply add a jdbc: prefix to the database URL? 🤣

When getting the endpoint from meta, it has already been converted into a form with postgres:, mysql: as prefixes, including configure as MetaBackend::Sql. So MetaBackend::Sql should be unreachable.

Unrelated to this PR, 🤔 I think we'd better use sql config in risedev only for testing purpose. To support scenarios where user and password contain special characters, it's best to specify them separately through env in both production environment and cloud. That's the reason why a subdivided backend was introduced in #17530 .

If the underlying database is Oracle or SQL Server, I think that's acceptable. However, I still want to verify the underlying database type. For instance, SQLite is not a suitable catalog for Iceberg. Concurrent updates to SQLite by both the metadata service and Iceberg can easily cause SQLite to become unresponsive.

chenzl25 and others added 23 commits November 21, 2024 16:01

support create table with primary key

41f3b90

feat(iceberg): support create table without pk for nimtable (#18406)

73f914a

feat(nimtable): nimtable hide iceberg row (#18410)

5c83b2f

fix engine migration

11734b3

feat(iceberg): support drop table for nimtable (#18404)

94c045e

feat(iceberg): support dql for nimtable (#18408)

f53edc2

feat(nimtable): support ban ddl for iceberg engine table (#18409)

559d5ac

feat(nimtable): nimtable make drop table more robust (#18422)

25c7087

feat(nimtable): enable create index on iceberg table (#18526)

6edbf1c

feat(nimtable): reuse existing env and add risedev nimtable (#18531)

959f267

feat(nimtable): fix drop table with schema (#18904)

343d9b7

resolve conficts with iceberg properties refactor

9643140

fix(nimtable): fix database path (#19182)

b9acc71

fix compile

0b3de6a

change nimtable to iceberg

07887b1

fix grafana

9060bd3

make iceberg ak&sk optional

3204184

resolve conflicts

91ecc30

refactor enable_config_load

c9d44a6

refactor env

0353ac2

fix

8f90716

fix

7a10d06

resolve conflicts

41726fb

chenzl25 requested a review from a team as a code owner November 26, 2024 09:47

chenzl25 requested a review from xiangjinwu November 26, 2024 09:47

github-actions bot added type/feature ci/run-e2e-single-node-tests ci/run-e2e-test-other-backends labels Nov 26, 2024

chenzl25 added the ci/run-e2e-iceberg-sink-tests label Nov 26, 2024

chenzl25 requested a review from kwannoel November 26, 2024 09:50

chenzl25 added 2 commits November 28, 2024 15:48

remove iceberg-engine from risedev

496b7d1

fmt

a5fc9cd

chenzl25 requested review from yezizp2012 and BugenZhao November 28, 2024 07:50

resolve conflicts

7231f60

xxchan added the ci/run-backwards-compat-tests Run backwards compatibility tests in your PR. label Nov 29, 2024

xxchan reviewed Nov 29, 2024

View reviewed changes

src/meta/model/src/table.rs Outdated Show resolved Hide resolved

chenzl25 added 6 commits December 2, 2024 14:02

resolve conflicts

dabe701

fmt

06838ba

fix backward compatibility by making engine optional

c5c1e86

fix

c691fd4

resolve conflicts

9795721

fix

f03fd70

fuyufjh reviewed Dec 3, 2024

View reviewed changes

chenzl25 added 2 commits December 3, 2024 16:22

resolve conflicts

35c9cee

refine

1858251

chenzl25 requested review from fuyufjh and xxchan December 3, 2024 10:29

chenzl25 enabled auto-merge December 4, 2024 08:05

remove clap

ba7a6d1

xiangjinwu approved these changes Dec 4, 2024

View reviewed changes

chenzl25 added 3 commits December 4, 2024 16:18

Merge branch 'main' into dylan/support_create_iceberg_engine_table

3c782c2

Merge branch 'main' into dylan/support_create_iceberg_engine_table

932471f

Merge branch 'main' into dylan/support_create_iceberg_engine_table

1c681b8

chenzl25 added this pull request to the merge queue Dec 5, 2024

Merged via the queue into main with commit 59fa5f8 Dec 5, 2024
38 of 39 checks passed

chenzl25 deleted the dylan/support_create_iceberg_engine_table branch December 5, 2024 10:51

BugenZhao reviewed Dec 9, 2024

View reviewed changes

chenzl25 mentioned this pull request Dec 9, 2024

Tracking: Iceberg engine table #19418

Open

17 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(iceberg): support iceberg engine table (in local env) #19577

feat(iceberg): support iceberg engine table (in local env) #19577

chenzl25 commented Nov 26, 2024 •

edited

Loading

fuyufjh Dec 3, 2024

chenzl25 Dec 3, 2024

fuyufjh Dec 3, 2024

chenzl25 Dec 3, 2024 •

edited

Loading

xiangjinwu left a comment

BugenZhao Nov 29, 2024

chenzl25 Dec 9, 2024

BugenZhao Dec 9, 2024

yezizp2012 Dec 9, 2024

chenzl25 Dec 9, 2024

		let catalog_writer = session.catalog_writer()?;
		// TODO(iceberg): make iceberg engine table creation ddl atomic

feat(iceberg): support iceberg engine table (in local env) #19577

feat(iceberg): support iceberg engine table (in local env) #19577

Conversation

chenzl25 commented Nov 26, 2024 • edited Loading

What's changed and what's your intention?

Example

Checklist

Documentation

Release note

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chenzl25 Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

xiangjinwu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chenzl25 commented Nov 26, 2024 •

edited

Loading

chenzl25 Dec 3, 2024 •

edited

Loading