[ISSUE-197] Support translate complex join to match #198
+4,187
−288
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
For users who are not familiar with the GQL language, it can be difficult to directly use Match to process graphs. GeaFlow aims to materialize joins using graphs and utilize Match to achieve join calculations indirectly. Therefore, we hope that users can write Join queries in a SQL-like manner, and GeaFlow DSL will automatically convert them into GQL processing. This syntactical transformation not only guarantees performance but also provides convenience to users, allowing them to enjoy the benefits of graph computing without writing any GQL syntax.
In Phase 1 (see issue #153), we supported transforming general SQL joins into GQL GraphMatch, but the inputs for joins had to be simple vertex or edge scans. The goal of Phase 2 is to support nesting some single-input operators that perform preprocessing on vertices or edges, such as project, filter, aggregate, etc. In Phase 2, Only inner join types are supported yet, but we can also perform the aforementioned additional operations on the results of the join.
Of course, the prerequisite for this transformation is to have a clear graph schema. Users still need to define the graph schema in advance and declare the graph used for the query. Similar to issue #153, we will not go into detail here.
As a result, we can now support executing practical SQL queries in GQL, such as the following:
USE GRAPH g_student;
INSERT INTO aggregate_to_match_003_result
select
table_26.col_4 as col_4,
col_27 as col_32,
col_28 as col_33,
col_29 as col_34,
col_30 as col_35
from
(
select
table_13.col_6 as col_4,
sum(col_14) as col_27,
count(
distinct IF(table_13.col_2 % 2 = 0, table_13.col_3, cast(null as bigint))
) as col_28,
sum(col_15) as col_29,
count(
distinct IF(table_13.col_2 % 2 = 1, table_13.col_3, cast(null as bigint))
) as col_30
from
(
select
table_12.col_4 as col_4,
table_10.id as col_6,
table_12.col_2 as col_2,
table_12.col_3 as col_3,
count(
IF(table_12.col_2 % 2 = 0, table_12.col_3, cast(null as bigint))
) as col_14,
count(
IF(table_12.col_2 % 2 = 1, table_12.col_3, cast(null as bigint))
) as col_15
from
(
select srcId as col_2, targetId as col_3, ts as col_4
from selectCourse table11
) table_12, student table_10 where table_12.col_2 = table_10.id
group by
table_12.col_4,
table_10.id,
table_12.col_3,
table_12.col_2
) table_13
INNER JOIN (
select
table_24.col_6 as col_6
from
(
select
table_21.id as col_6
from
(
select
table_22.srcId as col_19
from
hasMonitor table_22
) table_23
INNER JOIN student table_21 on table_23.col_19 = table_21.id
group by
table_21.id
) table_24
group by
table_24.col_6
) table_25 on table_13.col_6 = table_25.col_6
and table_13.col_6 = table_25.col_6
group by
table_13.col_6
) table_26
order by
col_32 DESC,
col_4 DESC
limit
10000
How was this PR tested?