Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] V3.4 - Dynamic Overwrite #54329

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions docs/en/loading/InsertInto.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ StarRocks v2.4 further supports overwriting data into a table by using INSERT OV
>
> If you need to verify the data before overwriting it, instead of using INSERT OVERWRITE, you can follow the above procedures to overwrite your data and verify it before swapping the partitions.

From v3.4.0 onwards, StarRocks supports a new semantic - Dynamic Overwrite for INSERT OVERWRITE with partitioned tables. For more information, see [Dynamic Overwrite](#dynamic-overwrite).

## Precautions

- You can cancel a synchronous INSERT transaction only by pressing the **Ctrl** and **C** keys from your MySQL client.
Expand Down Expand Up @@ -356,6 +358,38 @@ WITH LABEL insert_load_wikipedia_ow_3
SELECT event_time, channel FROM source_wiki_edit;
```

### Dynamic Overwrite

From v3.4.0 onwards, StarRocks supports a new semantic - Dynamic Overwrite for INSERT OVERWRITE with partitioned tables.

Currently, the default behavior of INSERT OVERWRITE is as follows:

- When overwriting a partitioned table as a whole (that is, without specifying the PARTITION clause), new data records will replace the data in their corresponding partitions. If there are partitions that are not involved, they will be truncated while the others are overwritten.
- When overwriting an empty partitioned table (that is, with no partitions in it) and specifying the PARTITION clause, the system returns an error `ERROR 1064 (HY000): Getting analyzing error. Detail message: Unknown partition 'xxx' in table 'yyy'`.
- When overwriting a partitioned table and specifying a non-existent partition in the PARTITION clause, the system returns an error `ERROR 1064 (HY000): Getting analyzing error. Detail message: Unknown partition 'xxx' in table 'yyy'`.
- When overwriting a partitioned table with data records that do not match any of the specified partitions in the PARTITION clause, the system either returns an error `ERROR 1064 (HY000): Insert has filtered data in strict mode` (if the strict mode is enabled) or filters the unqualified data records (if the strict mode is disabled).

The behavior of the new Dynamic Overwrite semantic is much different:

When overwriting a partitioned table as a whole, new data records will replace the data in their corresponding partitions. If there are partitions that are not involved, they will be left alone, instead of being truncated or deleted. And if there are new data records correspond to a non-existent partition, the system will create the partition.

The Dynamic Overwrite semantic is disabled by default. To enable it, you need to set the system variable `dynamic_overwrite` to `true`.

Enable Dynamic Overwrite in the current session:

```SQL
SET dynamic_overwrite = true;
```

You can also set it in the hint of the INSERT OVERWRITE statement to allow it take effect for the statement only:.

Example:

```SQL
INSERT OVERWRITE /*+set_var(set dynamic_overwrite = false)*/ insert_wiki_edit
SELECT * FROM source_wiki_edit;
```

## Insert data into a table with generated columns

A generated column is a special column whose value is derived from a pre-defined expression or evaluation based on other columns. Generated columns are especially useful when your query requests involve evaluations of expensive expressions, for example, querying a certain field from a JSON value, or calculating ARRAY data. StarRocks evaluates the expression and stores the results in the generated columns while data is being loaded into the table, thereby avoiding the expression evaluation during queries and improving the query performance.
Expand Down
8 changes: 8 additions & 0 deletions docs/en/sql-reference/System_variable.md
Original file line number Diff line number Diff line change
Expand Up @@ -373,6 +373,14 @@ Used to enable the streaming pre-aggregations. The default value is `false`, mea

Used for MySQL client compatibility. No practical usage.

### dynamic_overwrite

* **Description**: Whether to enable the [Dynamic Overwrite](./sql-statements/loading_unloading/INSERT.md#dynamic-overwrite) semantic for INSERT OVERWRITE with partitioned tables. Valid values:
* `true`: Enables Dynamic Overwrite.
* `false`: Disables Dynamic Overwrite and uses the default semantic.
* **Default**: false
* **Introduced in**: v3.4.0

<!--
### enable_collect_table_level_scan_stats (Invisible to users)

Expand Down
32 changes: 32 additions & 0 deletions docs/en/sql-reference/sql-statements/loading_unloading/INSERT.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,38 @@ Query OK, 5 rows affected, 2 warnings (0.05 sec)

- After INSERT OVERWRITE statement is executed, StarRocks creates temporary partitions for the partitions that store the original data, inserts data into the temporary partitions, and swaps the original partitions with the temporary partitions. All these operations are executed in the Leader FE node. Therefore, if the Leader FE node crashes while executing INSERT OVERWRITE statement, the whole load transaction fails, and the temporary partitions are deleted.

### Dynamic Overwrite

From v3.4.0 onwards, StarRocks supports a new semantic - Dynamic Overwrite for INSERT OVERWRITE with partitioned tables.

Currently, the default behavior of INSERT OVERWRITE is as follows:

- When overwriting a partitioned table as a whole (that is, without specifying the PARTITION clause), new data records will replace the data in their corresponding partitions. If there are partitions that are not involved, they will be truncated while the others are overwritten.
- When overwriting an empty partitioned table (that is, with no partitions in it) and specifying the PARTITION clause, the system returns an error `ERROR 1064 (HY000): Getting analyzing error. Detail message: Unknown partition 'xxx' in table 'yyy'`.
- When overwriting a partitioned table and specifying a non-existent partition in the PARTITION clause, the system returns an error `ERROR 1064 (HY000): Getting analyzing error. Detail message: Unknown partition 'xxx' in table 'yyy'`.
- When overwriting a partitioned table with data records that do not match any of the specified partitions in the PARTITION clause, the system either returns an error `ERROR 1064 (HY000): Insert has filtered data in strict mode` (if the strict mode is enabled) or filters the unqualified data records (if the strict mode is disabled).

The behavior of the new Dynamic Overwrite semantic is much different:

When overwriting a partitioned table as a whole, new data records will replace the data in their corresponding partitions. If there are partitions that are not involved, they will be left alone, instead of being truncated or deleted. And if there are new data records correspond to a non-existent partition, the system will create the partition.

The Dynamic Overwrite semantic is disabled by default. To enable it, you need to set the system variable `dynamic_overwrite` to `true`.

Enable Dynamic Overwrite in the current session:

```SQL
SET dynamic_overwrite = true;
```

You can also set it in the hint of the INSERT OVERWRITE statement to allow it take effect for the statement only:.

Example:

```SQL
INSERT OVERWRITE /*+set_var(set dynamic_overwrite = false)*/ insert_wiki_edit
SELECT * FROM source_wiki_edit;
```

## Example

The following examples are based on table `test`, which contains two columns `c1` and `c2`. The `c2` column has a default value of DEFAULT.
Expand Down
34 changes: 34 additions & 0 deletions docs/zh/loading/InsertInto.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ import InsertPrivNote from '../_assets/commonMarkdown/insertPrivNote.md'

如果您希望在替换前验证数据,可以根据以上步骤自行实现覆盖写入数据。

从 v3.4.0 开始,StarRocks 支持分区表的 INSERT OVERWRITE 操作的新语义 — Dynamic Overwrite。更多信息,参考 [Dynamic Overwrite](#dynamic-overwrite)。

## 注意事项

- 您只能在 MySQL 客户端通过 `Ctrl` + `C` 按键强制取消同步 INSERT 导入任务。
Expand Down Expand Up @@ -338,6 +340,38 @@ WITH LABEL insert_load_wikipedia_ow_3
SELECT event_time, channel FROM source_wiki_edit;
```

### Dynamic Overwrite

从 v3.4.0 开始,StarRocks 支持分区表的 INSERT OVERWRITE 操作的新语义 — Dynamic Overwrite。

当前 INSERT OVERWRITE 默认行为如下:

- 当覆盖整个分区表(即未指定 PARTITION 子句)时,新数据会替换对应分区中的数据。如果存在表中已有分区未涉及覆盖操作,系统会清空该分区数据。
- 当覆盖空的分区表(即其中没有任何分区)但指定了 PARTITION 子句时,系统会报错 `ERROR 1064 (HY000): Getting analyzing error. Detail message: Unknown partition 'xxx' in table 'yyy'`。
- 当覆盖分区表时指定了不存在的分区,系统会报错 `ERROR 1064 (HY000): Getting analyzing error. Detail message: Unknown partition 'xxx' in table 'yyy'`。
- 当覆盖分区表的数据与指定的分区不匹配时,如果开启严格模式,系统会报错 `ERROR 1064 (HY000): Insert has filtered data in strict mode`;如果未开启严格模式,系统会过滤不合格的数据。

新的 Dynamic Overwrite 语义的行为与上述默认行为有很大不同:

当覆盖整个分区表时,新数据会替换对应分区中的数据。但未涉及的分区会保留,而不会被清空或删除。如果新数据对应不存在的分区,系统会自动创建该分区。

Dynamic Overwrite 语义默认禁用。如需启用,需要将系统变量 `dynamic_overwrite` 设置为 `true`。

在当前 Session 中启用 Dynamic Overwrite:

```SQL
SET dynamic_overwrite = true;
```

您也可以在 INSERT OVERWRITE 语句中通过 Hint 启用 Dynamic Overwrite,仅对该语句生效:

示例:

```SQL
INSERT OVERWRITE /*+set_var(set dynamic_overwrite = false)*/ insert_wiki_edit
SELECT * FROM source_wiki_edit;
```

## 通过 INSERT 语句导入数据至生成列

生成列(Generated Columns)是一种特殊的列,它的值会根据列定义中的表达式自动计算得出。并且,你不能直接写入或更新生成列的值。当您的查询请求涉及对表达式的计算时,例如查询 JSON 类型的某个字段,或者针对 ARRAY 数据计算,生成列尤其有用。在数据导入时,StarRocks 将计算表达式,然后将结果存储在生成列中,从而避免了在查询过程中计算表达式,进而提高了查询性能。
Expand Down
9 changes: 9 additions & 0 deletions docs/zh/sql-reference/System_variable.md
Original file line number Diff line number Diff line change
Expand Up @@ -374,6 +374,15 @@ ALTER USER 'jack' SET PROPERTIES ('session.query_timeout' = '600');
* 描述:用于兼容 MySQL 客户端,无实际作用。
* 默认值:4
* 类型:Int

### dynamic_overwrite

* 描述:是否为 INSERT OVERWRITE 语句覆盖写分区表时启用 [Dynamic Overwrite](./sql-statements/loading_unloading/INSERT.md#dynamic-overwrite) 语义。有效值:
* `true`:启用 Dynamic Overwrite。
* `false`:禁用 Dynamic Overwrite 并使用默认语义。
* 默认值:false
* 引入版本:v3.4.0

<!--
### enable_collect_table_level_scan_stats (Invisible to users)

Expand Down
32 changes: 32 additions & 0 deletions docs/zh/sql-reference/sql-statements/loading_unloading/INSERT.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,38 @@ displayed_sidebar: docs

- 执行 INSERT OVERWRITE 语句后,系统将为目标分区创建相应的临时分区,并将数据写入临时分区,最后使用临时分区原子替换目标分区来实现覆盖写入。其所有过程均在在 Leader FE 节点执行。因此,如果 Leader FE 节点在覆盖写入过程中发生宕机,将会导致该次 INSERT OVERWRITE 导入失败,其过程中所创建的临时分区也会被删除。

### Dynamic Overwrite

从 v3.4.0 开始,StarRocks 支持分区表的 INSERT OVERWRITE 操作的新语义 — Dynamic Overwrite。

当前 INSERT OVERWRITE 默认行为如下:

- 当覆盖整个分区表(即未指定 PARTITION 子句)时,新数据会替换对应分区中的数据。如果存在表中已有分区未涉及覆盖操作,系统会清空该分区数据。
- 当覆盖空的分区表(即其中没有任何分区)但指定了 PARTITION 子句时,系统会报错 `ERROR 1064 (HY000): Getting analyzing error. Detail message: Unknown partition 'xxx' in table 'yyy'`。
- 当覆盖分区表时指定了不存在的分区,系统会报错 `ERROR 1064 (HY000): Getting analyzing error. Detail message: Unknown partition 'xxx' in table 'yyy'`。
- 当覆盖分区表的数据与指定的分区不匹配时,如果开启严格模式,系统会报错 `ERROR 1064 (HY000): Insert has filtered data in strict mode`;如果未开启严格模式,系统会过滤不合格的数据。

新的 Dynamic Overwrite 语义的行为与上述默认行为有很大不同:

当覆盖整个分区表时,新数据会替换对应分区中的数据。但未涉及的分区会保留,而不会被清空或删除。如果新数据对应不存在的分区,系统会自动创建该分区。

Dynamic Overwrite 语义默认禁用。如需启用,需要将系统变量 `dynamic_overwrite` 设置为 `true`。

在当前 Session 中启用 Dynamic Overwrite:

```SQL
SET dynamic_overwrite = true;
```

您也可以在 INSERT OVERWRITE 语句中通过 Hint 启用 Dynamic Overwrite,仅对该语句生效:

示例:

```SQL
INSERT OVERWRITE /*+set_var(set dynamic_overwrite = false)*/ insert_wiki_edit
SELECT * FROM source_wiki_edit;
```

## 示例

以下示例基于表 `test`,其中包含两个列 `c1` 和 `c2`。`c2` 列有默认值 DEFAULT。
Expand Down
Loading