Skip to content

Commit

Permalink
[INLONG-1017][Doc] Add quick-start document of Transform. (#1022)
Browse files Browse the repository at this point in the history
  • Loading branch information
luchunliang authored Sep 29, 2024
1 parent f38368a commit e25fbc9
Show file tree
Hide file tree
Showing 3 changed files with 159 additions and 0 deletions.
4 changes: 4 additions & 0 deletions docs/quick_start/transform/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"label": "Transform",
"position": 6
}
78 changes: 78 additions & 0 deletions docs/quick_start/transform/sdk_example.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
---
title: SDK Usage Example
sidebar_position: 1
---

# Prerequisites
- JDK 1.8 or above
- Maven 2.5 or above

# Installing SDK Dependencies
You need to include the SDK library in your project to use the SDK. The library can be obtained in the following two ways:
- Obtain the source code, compile it yourself, and deploy the SDK package to the local repository. See [How to Compile](https://inlong.apache.org/docs/next/quick_start/how_to_build/) for details.
- Directly reference the existing library in the Apache repository.
xml
```xml
<dependency>
<groupId>org.apache.inlong</groupId>
<artifactId>transform-sdk</artifactId>
<version>1.13.0</version>
</dependency>
```

# Specific Examples
## Transform Requirements
- Filter out the video playback start data in the Shenzhen region, the original fields include:
- event_time
- zone, optional values: [ shenzhen, shanghai, beijing ]
- video_id
- username
- operation_type, optional values [ start, end ]
- Original test data, CSV format, vertical bar delimited.
```shell
2024-05-09 20:00:01|shenzhen|1111|zhangsan|start
2024-05-09 20:00:02|shanghai|1111|lisi|start
2024-05-09 20:00:03|shanghai|1111|lisi|end
2024-05-09 20:00:04|shenzhen|1111|zhangsan|end
2024-05-09 20:00:05|beijing|1111|zhangsan|start
2024-05-09 20:00:06|beijing|1111|zhangsan|end
```
- Expected result data, KV format
```shell
event_time=2024-05-09 20:00:02&zone=shanghai&video_id=1111&username=lisi&operation_type=start
```
## Transform SDK Implementation
### Configure Source Data Configuration
```java
// source
SourceInfo csvSource = new CsvSourceInfo("UTF-8", "|", "\\", null);
```
### Configure Result Data Configuration
```java
// sink
SinkInfo kvSink = new KvSinkInfo("UTF-8", null);
```
### Execute Transformation Logic
```java
String transformSql = "select $1 event_time,$2 zone,$3 video_id,$4 username,$5 operation_type from source where $2='shenzhen' and $5='start' ";
TransformConfig config = new TransformConfig(csvSource, kvSink, transformSql);
```
### Build SDK Object and Execute Transformation
```java
TransformProcessor processor = new TransformProcessor(config);

String srcString = "2024-05-09 20:00:01|shenzhen|1111|zhangsan|start\n"
+ "2024-05-09 20:00:02|shanghai|1111|lisi|start\n"
+ "2024-05-09 20:00:03|shanghai|1111|lisi|end\n"
+ "2024-05-09 20:00:04|shenzhen|1111|zhangsan|end\n"
+ "2024-05-09 20:00:05|beijing|1111|zhangsan|start\n"
+ "2024-05-09 20:00:06|beijing|1111|zhangsan|end";

List<String> outputs = processor.transform("2024-04-28 00:00:00|ok", new HashMap<>());

// Expected outcome:[event_time=2024-05-09 20:00:02&zone=shanghai&video_id=1111&username=lisi&operation_type=start]
System.out.println(outputs);
```

# More Transform Examples
- For more examples, please see [More Examples](https://github.com/apache/inlong/blob/master/inlong-sdk/transform-sdk/src/test/java/org/apache/inlong/sdk/transform/process).
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
---
title: SDK 使用示例
sidebar_position: 1
---

# 前提条件
- JDK 1.8 或以上
- Maven 2.5 或以上

# 安装 SDK 依赖库
需要在项目中包含 SDK 库,进行 SDK 的使用。库提供以下两种获取方式:
- 获取源码自行编译并将 SDK 包部署到本地仓库,详见[如何编译](https://inlong.apache.org/docs/next/quick_start/how_to_build/)
- 直接引用 Apache 仓库里的已有库。
```xml
<dependency>
<groupId>org.apache.inlong</groupId>
<artifactId>transform-sdk</artifactId>
<version>1.13.0</version>
</dependency>
```

# 具体样例
## Transform 需求
- 将shenzhen地区的视频播放开始数据过滤出来,原始字段包括:
- event_time,事件时间
- zone,地区,可选值:[ shenzhen, shanghai, beijing ]
- video_id,视频ID
- username,用户名
- operation_type,操作类型,可选值[ start, end ]
- 原始测试数据,CSV 格式,竖线分隔
```shell
2024-05-09 20:00:01|shenzhen|1111|zhangsan|start
2024-05-09 20:00:02|shanghai|1111|lisi|start
2024-05-09 20:00:03|shanghai|1111|lisi|end
2024-05-09 20:00:04|shenzhen|1111|zhangsan|end
2024-05-09 20:00:05|beijing|1111|zhangsan|start
2024-05-09 20:00:06|beijing|1111|zhangsan|end
```
- 预期结果数据,KV格式
```shell
event_time=2024-05-09 20:00:02&zone=shanghai&video_id=1111&username=lisi&operation_type=start
```
## Transform SDK 实现
### 配置源数据配置
```java
// source
SourceInfo csvSource = new CsvSourceInfo("UTF-8", "|", "\\", null);
```
### 配置结果数据配置
```java
// sink
SinkInfo kvSink = new KvSinkInfo("UTF-8", null);
```
### 执行转换逻辑
```java
String transformSql = "select $1 event_time,$2 zone,$3 video_id,$4 username,$5 operation_type from source where $2='shenzhen' and $5='start' ";
TransformConfig config = new TransformConfig(csvSource, kvSink, transformSql);
```
### 构建SDK对象并执行转换
```java
TransformProcessor processor = new TransformProcessor(config);

String srcString = "2024-05-09 20:00:01|shenzhen|1111|zhangsan|start\n"
+ "2024-05-09 20:00:02|shanghai|1111|lisi|start\n"
+ "2024-05-09 20:00:03|shanghai|1111|lisi|end\n"
+ "2024-05-09 20:00:04|shenzhen|1111|zhangsan|end\n"
+ "2024-05-09 20:00:05|beijing|1111|zhangsan|start\n"
+ "2024-05-09 20:00:06|beijing|1111|zhangsan|end";

List<String> outputs = processor.transform("2024-04-28 00:00:00|ok", new HashMap<>());

// Expected outcome:[event_time=2024-05-09 20:00:02&zone=shanghai&video_id=1111&username=lisi&operation_type=start]
System.out.println(outputs);
```

# 更多 Transform 样例
- 请详见 [更多样例](https://github.com/apache/inlong/blob/master/inlong-sdk/transform-sdk/src/test/java/org/apache/inlong/sdk/transform/process)

0 comments on commit e25fbc9

Please sign in to comment.