Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add testcase to measure scale-out time while having 90% storage utilization #9273

Closed
Lakshmipathi opened this issue Nov 19, 2024 · 2 comments
Assignees
Labels
area/elastic cloud Issues related to the elastic cloud project area/tablets P1 Urgent

Comments

@Lakshmipathi
Copy link

  • Create 3 node cluster with rf=3.
  • After reaching 90% wait for an hour.
  • Now perform scale-out then calculate the time taken for scale-out operation.
  • Verify the time taken for scale-out operation with-in 30 mins for i4i.large.
@Lakshmipathi Lakshmipathi self-assigned this Nov 19, 2024
@Lakshmipathi Lakshmipathi added the area/elastic cloud Issues related to the elastic cloud project label Nov 19, 2024
@Lakshmipathi
Copy link
Author

Lakshmipathi commented Nov 19, 2024

On 3-node (Instance type: i4i.large) cluster reached 91% disk usage

[2024-11-13T11:32:12.990Z] < t:2024-11-13 11:32:12,962 f:full_storage_utilization_test.py l:170  c:FullStorageUtilizationTest p:INFO  > Current max disk usage after writing to keyspace7: 91% (393 GB / 392.40000000000003 GB)

Wait for 30mins. Started throttled write

[2024-11-13T12:03:28.417Z] < t:2024-11-13 12:03:27,455 f:stress_thread.py l:325  c:sdcm.stress_thread   p:INFO  > cassandra-stress write no-warmup duration=30m -rate threads=16 "throttle=1400/s" -mode cql3 native -pop seq=1..5000000 -col "size=FIXED(10240) n=FIXED(1)" -schema keyspace=keyspace1 "replication(strategy=NetworkTopologyStrategy,replication_factor=3)" -node 10.4.2.253,10.4.0.91,10.4.3.83 -errors skip-unsupported-columns

few mins later scaleout by adding a node.

[2024-11-13T12:07:22.875Z] < t:2024-11-13 12:07:22,783 f:full_storage_utilization_test.py l:62   c:FullStorageUtilizationTest p:INFO  > Started adding a new node

New node added to cluster and waiting for tablets to be balanced,

[2024-11-13T12:09:09.450Z] < t:2024-11-13 12:09:09,302 f:full_storage_utilization_test.py l:177  c:FullStorageUtilizationTest p:INFO  > New node added, total nodes in cluster: 4
[2024-11-13T12:10:21.370Z] < t:2024-11-13 12:10:11,650 f:common.py       l:40   c:sdcm.utils.tablets.common p:INFO  > Waiting for tablets to be balanced

Tablets are balanced now

[2024-11-13T12:33:54.939Z] < t:2024-11-13 12:33:53,453 f:common.py       l:45   c:sdcm.utils.tablets.common p:INFO  > Tablets are balanced

Total time taken to add a node in a 3-node cluster is:

[2024-11-13T12:33:54.939Z] < t:2024-11-13 12:33:53,453 f:full_storage_utilization_test.py l:66   c:FullStorageUtilizationTest p:INFO  > Adding a node finished with time: 1590.6698877811432

Tablet migration over time
Image

max/avg disk utilization
Image

Latency
99th percentile write and read latency by Cluster (max at 90% disk utilization)

syscall value
writes 3.58ms
read 2.05ms

https://jenkins.scylladb.com/job/scylla-staging/job/LakshmipathiGanapathi/job/byo-longevity-test/219/console

@pehala
Copy link
Contributor

pehala commented Dec 9, 2024

Closing in favour of #9156

@pehala pehala closed this as completed Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/elastic cloud Issues related to the elastic cloud project area/tablets P1 Urgent
Projects
None yet
Development

No branches or pull requests

3 participants