Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compactor OOM when run nexmark q19,q20 with minio storage #16685

Closed
huangjw806 opened this issue May 10, 2024 · 2 comments
Closed

Compactor OOM when run nexmark q19,q20 with minio storage #16685

huangjw806 opened this issue May 10, 2024 · 2 comments
Assignees
Milestone

Comments

@huangjw806
Copy link
Contributor

huangjw806 commented May 10, 2024

test setting:

KUBEBENCH_CONFIG="{'benchmark':{'risingwave':{'storage':{'type':'minio'}},'minio':{'enabled':true,'node_selector':'risingwave.cloud/nodegroup-name:ondemand-ng-test-8c16g-ssd-0','resources':{'cpu':{'limit':'8','request':'7'},'mem':{'limit':'13Gi','request':'13Gi'}},'persistence':{'enabled':false}}}}"
RW_META_STORE="postgresql"
RW_VERSION="nightly-20240507"

test pipeline:

https://buildkite.com/risingwave-test/nexmark-benchmark/builds/3648
https://buildkite.com/risingwave-test/nexmark-benchmark/builds/3639

compactor pod state:

State:          Running
  | Started:      Thu, 09 May 2024 08:54:08 +0000
  | Last State:     Terminated
  | Reason:       OOMKilled
  | Exit Code:    137
  | Started:      Thu, 09 May 2024 08:49:00 +0000
  | Finished:     Thu, 09 May 2024 08:49:04 +0000
  | Ready:          True
  | Restart Count:  7
  | Limits:
  | memory:  4Gi
  | Requests:
  | memory:   2Gi
@github-actions github-actions bot added this to the release-1.10 milestone May 10, 2024
@hzxa21
Copy link
Collaborator

hzxa21 commented May 15, 2024

Compactor memory usage is 2x larger in MinIO than in S3. Findings:

  1. Object store throughput is higher in MinIO than in S3.
  2. Object store rates are similar
  3. Object store latency is also higher in MinIO than in S3.

1 and 2 make sense but 3 is surprising. We suspect that 3 is the reason for the OOM because when the slower the upload request, the more buffer will be accumulated in compactor because the uploads are concurrent.

MinIO run:
image

image

S3 run:
image

image

@hzxa21
Copy link
Collaborator

hzxa21 commented Jul 10, 2024

See #15946 (comment)

@hzxa21 hzxa21 closed this as completed Jul 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants