Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute node OOM on ch-benchmark modified q3 with v1.3-rc #12777

Closed
cyliu0 opened this issue Oct 11, 2023 · 11 comments
Closed

Compute node OOM on ch-benchmark modified q3 with v1.3-rc #12777

cyliu0 opened this issue Oct 11, 2023 · 11 comments
Assignees
Labels
Milestone

Comments

@cyliu0
Copy link
Collaborator

cyliu0 commented Oct 11, 2023

Describe the bug

We have a modified ch-benchmark q3 as below. The only difference compared to the original q3 is we removed c_state like 'a%' in the modified one. We didn't hit OOM with the original q3.

create materialized view ch_benchmark_q3
    as
    select   ol_o_id, ol_w_id, ol_d_id,
             sum(ol_amount) as revenue, o_entry_d
    from 	 customer, new_order, orders, order_line
    where
        -- c_state like 'a%' and
        c_id = o_c_id
      and c_w_id = o_w_id
      and c_d_id = o_d_id
      and no_w_id = o_w_id
      and no_d_id = o_d_id
      and no_o_id = o_id
      and ol_w_id = o_w_id
      and ol_d_id = o_d_id
      and ol_o_id = o_id
      and o_entry_d > '2007-01-02 00:00:00.000000'
    group by ol_o_id, ol_w_id, ol_d_id, o_entry_d;

Error message/log

Containers:
  compute:
    Container ID:  containerd://627cd5d007c23a1ff806506cfc922671793c07f76ca232720d92efa627671f11
    Image:         ghcr.io/risingwavelabs/risingwave:v1.3-rc
    Image ID:      ghcr.io/risingwavelabs/risingwave@sha256:a67419ab668c53cfa5ab8ad797b3b743bd159dcc5e019d2f21bc50559f74a752
    Ports:         5688/TCP, 1222/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      /risingwave/bin/risingwave
    Args:
      compute-node
    State:          Running
      Started:      Wed, 11 Oct 2023 15:37:03 +0800
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Wed, 11 Oct 2023 15:27:49 +0800
      Finished:     Wed, 11 Oct 2023 15:36:47 +0800
    Ready:          True

To Reproduce

Use go-tpc to preprare 4 warehouses of ch benchmark needed data in the upstream PostgreSQL.

Create the PG replication and RW sources.

Create the modified q3 MV via the above sql.

Expected behavior

Create the MV successfully.

How did you deploy RisingWave?

kube-bench

The version of RisingWave

v1.3-rc

PostgreSQL 9.5-RisingWave-1.3.0-alpha (6fd2187)

Additional context

No response

@cyliu0 cyliu0 added the type/bug Something isn't working label Oct 11, 2023
@github-actions github-actions bot added this to the release-1.4 milestone Oct 11, 2023
@cyliu0
Copy link
Collaborator Author

cyliu0 commented Oct 11, 2023

The original q7 will cause v1.3-rc & nightly-20231010 compute node OOM as well. But it works with image nightly-20231004.

create materialized view ch_benchmark_q7
    as
    select	 s_nationkey as supp_nation,
               substr(c_state,1,1) as cust_nation,
               extract(year from o_entry_d) as l_year,
               sum(ol_amount) as revenue
    from	 supplier, stock, order_line, orders, customer, nation n1, nation n2
    where	 ol_supply_w_id = s_w_id
      and ol_i_id = s_i_id
      and mod((s_w_id * s_i_id), 10000) = s_suppkey
      and ol_w_id = o_w_id
      and ol_d_id = o_d_id
      and ol_o_id = o_id
      and c_id = o_c_id
      and c_w_id = o_w_id
      and c_d_id = o_d_id
      and s_nationkey = n1.n_nationkey
      and ascii(substr(c_state,1,1)) - 65 = n2.n_nationkey
      and (
            (n1.n_name = 'JAPAN' and n2.n_name = 'CHINA')
            or
            (n1.n_name = 'CHINA' and n2.n_name = 'JAPAN')
        )
      and ol_delivery_d between '2007-01-02 00:00:00.000000' and '2032-01-02 00:00:00.000000'
    group by s_nationkey, substr(c_state,1,1), extract(year from o_entry_d);

@fuyufjh
Copy link
Member

fuyufjh commented Oct 11, 2023

What's the namespace of test? or Buildkite link?

@cyliu0
Copy link
Collaborator Author

cyliu0 commented Oct 11, 2023

@fuyufjh Namespace is cy-ch-pg-cdc. I'm reproducing it with memory profiling enable.

@fuyufjh
Copy link
Member

fuyufjh commented Oct 11, 2023

Caused by MemTable.

image

cy-ch-pg-cdc.zip

It's a known issue cc. @wcy-fdu (didn't find a GitHub issue, can you please create one?)

@wcy-fdu
Copy link
Contributor

wcy-fdu commented Oct 11, 2023

Caused by MemTable.

image [cy-ch-pg-cdc.zip](https://github.com/risingwavelabs/risingwave/files/12868740/cy-ch-pg-cdc.zip)

It's a known issue cc. @wcy-fdu (didn't find a GitHub issue, can you please create one?)

tracked in #12789

@lmatz
Copy link
Contributor

lmatz commented Oct 16, 2023

Since it requires memtable spilling, I suppose it takes time and does not block the release?

@wcy-fdu
Copy link
Contributor

wcy-fdu commented Oct 16, 2023

Since it requires memtable spilling, I suppose it takes time and does not block the release?

Yes, mem table spilling is under testing and I think we can release v1.3 firs.

@cyliu0
Copy link
Collaborator Author

cyliu0 commented Oct 24, 2023

Today, I have ran the ch benchmark modified queries several times. I found that there is another OOM query ch-benchmark modified query q7 OOM https://buildkite.com/risingwave-test/ch-benchmark-pg-cdc/builds/35

And ch-benchmark modified query q21 used 8GB, not sure if it's a problem https://buildkite.com/risingwave-test/ch-benchmark-pg-cdc/builds/39

    create materialized view ch_benchmark_q7
    as
    select	 s_nationkey as supp_nation,
               substr(c_state,1,1) as cust_nation,
               extract(year from o_entry_d) as l_year,
               sum(ol_amount) as revenue
    from	 supplier, stock, order_line, orders, customer, nation n1, nation n2
    where	 ol_supply_w_id = s_w_id
      and ol_i_id = s_i_id
      and mod((s_w_id * s_i_id), 10000) = s_suppkey
      and ol_w_id = o_w_id
      and ol_d_id = o_d_id
      and ol_o_id = o_id
      and c_id = o_c_id
      and c_w_id = o_w_id
      and c_d_id = o_d_id
      and s_nationkey = n1.n_nationkey
      and ascii(substr(c_state,1,1)) - 65 = n2.n_nationkey
      -- and (
      --      (n1.n_name = 'JAPAN' and n2.n_name = 'CHINA')
      --      or
      --      (n1.n_name = 'CHINA' and n2.n_name = 'JAPAN')
      --  )
      and ol_delivery_d between '2007-01-02 00:00:00.000000' and '2032-01-02 00:00:00.000000'
    group by s_nationkey, substr(c_state,1,1), extract(year from o_entry_d);
    
    create materialized view ch_benchmark_q21
    as
    select	 s_name, count(*) as numwait
    from	 supplier, order_line l1, orders, stock, nation
    where	 ol_o_id = o_id
    and ol_w_id = o_w_id
    and ol_d_id = o_d_id
    and ol_w_id = s_w_id
    and ol_i_id = s_i_id
    and mod((s_w_id * s_i_id),10000) = s_suppkey
    and l1.ol_delivery_d > (o_entry_d - interval '1 year')
    and not exists (select *
    from	order_line l2
    where  l2.ol_o_id = l1.ol_o_id
    and l2.ol_w_id = l1.ol_w_id
    and l2.ol_d_id = l1.ol_d_id
    and l2.ol_delivery_d > (l1.ol_delivery_d - interval '1 year'))
    and s_nationkey = n_nationkey
    -- and n_name = 'CHINA'
    group by s_name;
image

@lmatz
Copy link
Contributor

lmatz commented Oct 24, 2023

if the memtable spilling PR is ready, test its image with the same queries again before merging the PR?

@wcy-fdu
Copy link
Contributor

wcy-fdu commented Oct 24, 2023

if the memtable spilling PR is ready, test its image with the same queries again before merging the PR?

Not ready yet, but will be soon.

@hzxa21 hzxa21 modified the milestones: release-1.4, release-1.5 Nov 8, 2023
@wcy-fdu
Copy link
Contributor

wcy-fdu commented Nov 8, 2023

PR of mem table spill is merged, after rerun this workload, no more oom occurs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants