Add testcase to measure scale-out time while having 90% storage utilization #9273

Lakshmipathi · 2024-11-19T06:52:09Z

Create 3 node cluster with rf=3.
After reaching 90% wait for an hour.
Now perform scale-out then calculate the time taken for scale-out operation.
Verify the time taken for scale-out operation with-in 30 mins for i4i.large.

Lakshmipathi · 2024-11-19T06:58:56Z

On 3-node (Instance type: i4i.large) cluster reached 91% disk usage

[2024-11-13T11:32:12.990Z] < t:2024-11-13 11:32:12,962 f:full_storage_utilization_test.py l:170  c:FullStorageUtilizationTest p:INFO  > Current max disk usage after writing to keyspace7: 91% (393 GB / 392.40000000000003 GB)

Wait for 30mins. Started throttled write

[2024-11-13T12:03:28.417Z] < t:2024-11-13 12:03:27,455 f:stress_thread.py l:325  c:sdcm.stress_thread   p:INFO  > cassandra-stress write no-warmup duration=30m -rate threads=16 "throttle=1400/s" -mode cql3 native -pop seq=1..5000000 -col "size=FIXED(10240) n=FIXED(1)" -schema keyspace=keyspace1 "replication(strategy=NetworkTopologyStrategy,replication_factor=3)" -node 10.4.2.253,10.4.0.91,10.4.3.83 -errors skip-unsupported-columns

few mins later scaleout by adding a node.

[2024-11-13T12:07:22.875Z] < t:2024-11-13 12:07:22,783 f:full_storage_utilization_test.py l:62   c:FullStorageUtilizationTest p:INFO  > Started adding a new node

New node added to cluster and waiting for tablets to be balanced,

[2024-11-13T12:09:09.450Z] < t:2024-11-13 12:09:09,302 f:full_storage_utilization_test.py l:177  c:FullStorageUtilizationTest p:INFO  > New node added, total nodes in cluster: 4
[2024-11-13T12:10:21.370Z] < t:2024-11-13 12:10:11,650 f:common.py       l:40   c:sdcm.utils.tablets.common p:INFO  > Waiting for tablets to be balanced

Tablets are balanced now

[2024-11-13T12:33:54.939Z] < t:2024-11-13 12:33:53,453 f:common.py       l:45   c:sdcm.utils.tablets.common p:INFO  > Tablets are balanced

Total time taken to add a node in a 3-node cluster is:

[2024-11-13T12:33:54.939Z] < t:2024-11-13 12:33:53,453 f:full_storage_utilization_test.py l:66   c:FullStorageUtilizationTest p:INFO  > Adding a node finished with time: 1590.6698877811432

Tablet migration over time

max/avg disk utilization

Latency
99th percentile write and read latency by Cluster (max at 90% disk utilization)

syscall	value
writes	3.58ms
read	2.05ms

https://jenkins.scylladb.com/job/scylla-staging/job/LakshmipathiGanapathi/job/byo-longevity-test/219/console

pehala · 2024-12-09T07:21:49Z

Closing in favour of #9156

Lakshmipathi self-assigned this Nov 19, 2024

Lakshmipathi added the area/elastic cloud Issues related to the elastic cloud project label Nov 19, 2024

Lakshmipathi mentioned this issue Nov 19, 2024

Add 90% storage utilization tests #9129

Open

13 tasks

pehala added area/tablets area/serverlessv2 P1 Urgent labels Nov 20, 2024

dani-tweig removed the area/serverlessv2 label Nov 26, 2024

pehala closed this as completed Dec 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add testcase to measure scale-out time while having 90% storage utilization #9273

Add testcase to measure scale-out time while having 90% storage utilization #9273

Lakshmipathi commented Nov 19, 2024

Lakshmipathi commented Nov 19, 2024 •

edited

Loading

pehala commented Dec 9, 2024

Add testcase to measure scale-out time while having 90% storage utilization #9273

Add testcase to measure scale-out time while having 90% storage utilization #9273

Comments

Lakshmipathi commented Nov 19, 2024

Lakshmipathi commented Nov 19, 2024 • edited Loading

pehala commented Dec 9, 2024

Lakshmipathi commented Nov 19, 2024 •

edited

Loading