diff --git a/docs/quick_start/transform/_category_.json b/docs/quick_start/transform/_category_.json new file mode 100644 index 00000000000..c09210a66b3 --- /dev/null +++ b/docs/quick_start/transform/_category_.json @@ -0,0 +1,4 @@ +{ + "label": "Transform", + "position": 6 +} \ No newline at end of file diff --git a/docs/quick_start/transform/sdk_example.md b/docs/quick_start/transform/sdk_example.md new file mode 100644 index 00000000000..f55fd95ccff --- /dev/null +++ b/docs/quick_start/transform/sdk_example.md @@ -0,0 +1,78 @@ +--- +title: SDK Usage Example +sidebar_position: 1 +--- + +# Prerequisites +- JDK 1.8 or above +- Maven 2.5 or above + +# Installing SDK Dependencies +You need to include the SDK library in your project to use the SDK. The library can be obtained in the following two ways: +- Obtain the source code, compile it yourself, and deploy the SDK package to the local repository. See [How to Compile](https://inlong.apache.org/docs/next/quick_start/how_to_build/) for details. +- Directly reference the existing library in the Apache repository. +xml +```xml + + org.apache.inlong + transform-sdk + 1.13.0 + +``` + +# Specific Examples +## Transform Requirements +- Filter out the video playback start data in the Shenzhen region, the original fields include: + - event_time + - zone, optional values: [ shenzhen, shanghai, beijing ] + - video_id + - username + - operation_type, optional values [ start, end ] +- Original test data, CSV format, vertical bar delimited. +```shell +2024-05-09 20:00:01|shenzhen|1111|zhangsan|start +2024-05-09 20:00:02|shanghai|1111|lisi|start +2024-05-09 20:00:03|shanghai|1111|lisi|end +2024-05-09 20:00:04|shenzhen|1111|zhangsan|end +2024-05-09 20:00:05|beijing|1111|zhangsan|start +2024-05-09 20:00:06|beijing|1111|zhangsan|end +``` +- Expected result data, KV format +```shell +event_time=2024-05-09 20:00:02&zone=shanghai&video_id=1111&username=lisi&operation_type=start +``` +## Transform SDK Implementation +### Configure Source Data Configuration +```java +// source +SourceInfo csvSource = new CsvSourceInfo("UTF-8", "|", "\\", null); +``` +### Configure Result Data Configuration +```java +// sink +SinkInfo kvSink = new KvSinkInfo("UTF-8", null); +``` +### Execute Transformation Logic +```java +String transformSql = "select $1 event_time,$2 zone,$3 video_id,$4 username,$5 operation_type from source where $2='shenzhen' and $5='start' "; +TransformConfig config = new TransformConfig(csvSource, kvSink, transformSql); +``` +### Build SDK Object and Execute Transformation +```java +TransformProcessor processor = new TransformProcessor(config); + +String srcString = "2024-05-09 20:00:01|shenzhen|1111|zhangsan|start\n" + + "2024-05-09 20:00:02|shanghai|1111|lisi|start\n" + + "2024-05-09 20:00:03|shanghai|1111|lisi|end\n" + + "2024-05-09 20:00:04|shenzhen|1111|zhangsan|end\n" + + "2024-05-09 20:00:05|beijing|1111|zhangsan|start\n" + + "2024-05-09 20:00:06|beijing|1111|zhangsan|end"; + +List outputs = processor.transform("2024-04-28 00:00:00|ok", new HashMap<>()); + +// Expected outcome:[event_time=2024-05-09 20:00:02&zone=shanghai&video_id=1111&username=lisi&operation_type=start] +System.out.println(outputs); +``` + +# More Transform Examples +- For more examples, please see [More Examples](https://github.com/apache/inlong/blob/master/inlong-sdk/transform-sdk/src/test/java/org/apache/inlong/sdk/transform/process). diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/transform/sdk_example.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/transform/sdk_example.md new file mode 100644 index 00000000000..ec345bc09d2 --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/transform/sdk_example.md @@ -0,0 +1,77 @@ +--- +title: SDK 使用示例 +sidebar_position: 1 +--- + +# 前提条件 +- JDK 1.8 或以上 +- Maven 2.5 或以上 + +# 安装 SDK 依赖库 +需要在项目中包含 SDK 库,进行 SDK 的使用。库提供以下两种获取方式: +- 获取源码自行编译并将 SDK 包部署到本地仓库,详见[如何编译](https://inlong.apache.org/docs/next/quick_start/how_to_build/)。 +- 直接引用 Apache 仓库里的已有库。 +```xml + + org.apache.inlong + transform-sdk + 1.13.0 + +``` + +# 具体样例 +## Transform 需求 +- 将shenzhen地区的视频播放开始数据过滤出来,原始字段包括: + - event_time,事件时间 + - zone,地区,可选值:[ shenzhen, shanghai, beijing ] + - video_id,视频ID + - username,用户名 + - operation_type,操作类型,可选值[ start, end ] +- 原始测试数据,CSV 格式,竖线分隔 +```shell +2024-05-09 20:00:01|shenzhen|1111|zhangsan|start +2024-05-09 20:00:02|shanghai|1111|lisi|start +2024-05-09 20:00:03|shanghai|1111|lisi|end +2024-05-09 20:00:04|shenzhen|1111|zhangsan|end +2024-05-09 20:00:05|beijing|1111|zhangsan|start +2024-05-09 20:00:06|beijing|1111|zhangsan|end +``` +- 预期结果数据,KV格式 +```shell +event_time=2024-05-09 20:00:02&zone=shanghai&video_id=1111&username=lisi&operation_type=start +``` +## Transform SDK 实现 +### 配置源数据配置 +```java +// source +SourceInfo csvSource = new CsvSourceInfo("UTF-8", "|", "\\", null); +``` +### 配置结果数据配置 +```java +// sink +SinkInfo kvSink = new KvSinkInfo("UTF-8", null); +``` +### 执行转换逻辑 +```java +String transformSql = "select $1 event_time,$2 zone,$3 video_id,$4 username,$5 operation_type from source where $2='shenzhen' and $5='start' "; +TransformConfig config = new TransformConfig(csvSource, kvSink, transformSql); +``` +### 构建SDK对象并执行转换 +```java +TransformProcessor processor = new TransformProcessor(config); + +String srcString = "2024-05-09 20:00:01|shenzhen|1111|zhangsan|start\n" + + "2024-05-09 20:00:02|shanghai|1111|lisi|start\n" + + "2024-05-09 20:00:03|shanghai|1111|lisi|end\n" + + "2024-05-09 20:00:04|shenzhen|1111|zhangsan|end\n" + + "2024-05-09 20:00:05|beijing|1111|zhangsan|start\n" + + "2024-05-09 20:00:06|beijing|1111|zhangsan|end"; + +List outputs = processor.transform("2024-04-28 00:00:00|ok", new HashMap<>()); + +// Expected outcome:[event_time=2024-05-09 20:00:02&zone=shanghai&video_id=1111&username=lisi&operation_type=start] +System.out.println(outputs); +``` + +# 更多 Transform 样例 +- 请详见 [更多样例](https://github.com/apache/inlong/blob/master/inlong-sdk/transform-sdk/src/test/java/org/apache/inlong/sdk/transform/process)。