Add multi-dc testcase for 90% storage utilization #9157

pehala · 2024-11-07T12:49:02Z

Create 3 node cluster with rf=3.
Reach 90% disk usage.
Scaleout cluster by adding a new DC to existing cluster under load.
Bump up the RF to 3 for both DCs
Measure latency under stress.
Add nodes in parallel to both DCs
Measure latency under stress (adjusted for 4 nodes).
Scale-in DC under load
Measure latency under stress (adjusted for 3 nodes).

cezarmoise · 2024-11-21T09:11:19Z

Last test run: https://jenkins.scylladb.com/view/staging/job/scylla-staging/job/cezar/job/byo-longevity-test/69/consoleFull

04:34:09  < t:2024-11-21 02:34:06,572 f:full_storage_utilization_test_2.py l:131  c:FullStorageUtilizationTest2 p:INFO  > Node     Total GB     Used GB      Avail GB     Used %  
04:34:29  < t:2024-11-21 02:34:28,756 f:full_storage_utilization_test_2.py l:143  c:FullStorageUtilizationTest2 p:INFO  > 1        436          396          40           91.0%
04:34:51  < t:2024-11-21 02:34:50,943 f:full_storage_utilization_test_2.py l:143  c:FullStorageUtilizationTest2 p:INFO  > 2        436          393          44           90.0%
04:35:13  < t:2024-11-21 02:35:13,134 f:full_storage_utilization_test_2.py l:143  c:FullStorageUtilizationTest2 p:INFO  > 3        436          395          42           91.0%
04:35:36  < t:2024-11-21 02:35:35,329 f:full_storage_utilization_test_2.py l:143  c:FullStorageUtilizationTest2 p:INFO  > 4        436          403          34           93.0%
04:35:36  < t:2024-11-21 02:35:35,851 f:full_storage_utilization_test_2.py l:143  c:FullStorageUtilizationTest2 p:INFO  > 5        436          37           400          9.0%
04:35:58  < t:2024-11-21 02:35:58,183 f:full_storage_utilization_test_2.py l:143  c:FullStorageUtilizationTest2 p:INFO  > 6        436          37           400          9.0%
04:36:21  < t:2024-11-21 02:36:20,536 f:full_storage_utilization_test_2.py l:143  c:FullStorageUtilizationTest2 p:INFO  > 7        436          37           400          9.0%
04:36:21  < t:2024-11-21 02:36:20,536 f:full_storage_utilization_test_2.py l:153  c:FullStorageUtilizationTest2 p:INFO  > Cluster  3052         1698         1360         56.0%

Did not redistribuite data to new dc.

Trying fix 1b0e85b

cezarmoise · 2024-11-22T11:08:19Z

https://argus.scylladb.com/tests/scylla-cluster-tests/e70ab70a-063f-463e-a289-39a90805e597

cassandra.InvalidRequest: Error from server: code=2200 [Invalid query] message="Only one DC's RF can be changed at a time and not by more than 1"

cezarmoise · 2024-11-22T12:09:40Z

https://github.com/cezarmoise/scylla-cluster-tests/tree/new-dc

Trying to alter all keyspaces before adding the dc so they have per dc replication, and changing it after is only one change.

https://argus.scylladb.com/tests/scylla-cluster-tests/4cb74447-6750-4bba-83ef-4ccad8cf6a89

cezarmoise · 2024-11-22T13:48:57Z

failed due to timeout on a large keyspace, updating to only add replicate small keyspaces

2024-11-22 13:41:14.531: (TestFrameworkEvent Severity.ERROR) period_type=one-time event_id=9263a51a-5198-4ce0-a22c-dfaf3d53fec5, source=FullStorageUtilizationTest2.test_scale_out (full_storage_utilization_test_2.FullStorageUtilizationTest2)() message=Traceback (most recent call last):
File "/home/ubuntu/scylla-cluster-tests/full_storage_utilization_test_2.py", line 245, in test_scale_out
self.scale_out()
File "/home/ubuntu/scylla-cluster-tests/full_storage_utilization_test_2.py", line 57, in scale_out
self.add_new_node()
File "/home/ubuntu/scylla-cluster-tests/full_storage_utilization_test_2.py", line 63, in add_new_node
self.reconfigure_keyspaces()
File "/home/ubuntu/scylla-cluster-tests/full_storage_utilization_test_2.py", line 97, in reconfigure_keyspaces
self.execute_cql(cql)
File "/home/ubuntu/scylla-cluster-tests/full_storage_utilization_test_2.py", line 36, in execute_cql
results = session.execute(query)
File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/common.py", line 1318, in execute_verbose
return execute_orig(*args, **kwargs)
File "cassandra/cluster.py", line 2729, in cassandra.cluster.Session.execute
File "cassandra/cluster.py", line 5120, in cassandra.cluster.ResponseFuture.result
cassandra.OperationTimedOut: errors={'10.4.3.201:9042': 'Client request timeout. See Session.execute[_async](timeout)'}, last_host=10.4.3.201:9042

cezarmoise · 2024-11-22T16:12:42Z

Still timeout issues, https://argus.scylladb.com/tests/scylla-cluster-tests/5117a642-3a7a-4c9a-ba43-d1898756f556

Set timeout on queries to 5min an try again

cezarmoise · 2024-11-25T09:19:36Z

https://argus.scylladb.com/tests/scylla-cluster-tests/6b7ea346-0ca8-42c8-a955-7f7f4f3d1922

Only added the small keyspaces to the new dc, as I got timeouts when trying to alter the large ones.
The big sleeps are removed here to run the test faster

Will update with a new run.

cezarmoise · 2024-11-26T11:14:46Z

https://argus.scylladb.com/tests/scylla-cluster-tests/5ab30315-6993-4b6e-8cca-b0b430076eca

Initial Cluster: 4 x i4i.large
Write to 70%; Sleep 30 min
Write to 90%; Sleep 30 min
Writes have RF=3
Add 3 nodes in new DC: 3 x i4i.large
Update all keyspaces with replication dc1: 3, dc2: 1
Sleep 30 minutes

NOTE:
No throttle r/w during scale out, got some errors
https://argus.scylladb.com/tests/scylla-cluster-tests/394cc2b3-4719-4034-b93e-38e904335565

pehala · 2024-11-26T11:27:50Z

Add 3 nodes in new DC: 3 x i4i.large

Do we know how long did it take to provision the new DC?

Update all keyspaces with replication dc1: 3, dc2: 1

Why different replication? Does the space occupied in the DC2 corresponds to 70% storage utilization or should it be higher/lower?

Sleep 30 minutes

Did we verify the new DC works as expected? i.e. with reads or writes?

cezarmoise · 2024-11-26T11:43:52Z

Add 3 nodes in new DC: 3 x i4i.large

Do we know how long did it take to provision the new DC?

01:10:18  < t:2024-11-25 23:10:17,822 f:full_storage_utilization_test_2.py l:55   c:FullStorageUtilizationTest2 p:INFO  > Started scale out
01:20:53  < t:2024-11-25 23:20:53,544 f:full_storage_utilization_test_2.py l:68   c:FullStorageUtilizationTest2 p:INFO  > New node(s) added, total nodes in cluster: 7
02:29:00  < t:2024-11-26 00:28:59,828 f:full_storage_utilization_test_2.py l:59   c:FullStorageUtilizationTest2 p:INFO  > Scale out finished with time: 4722.006384372711

10 minutes to add the nodes
1 hour to update the keypsaces

Update all keyspaces with replication dc1: 3, dc2: 1

Why different replication? Does the space occupied in the DC2 corresponds to 70% storage utilization or should it be higher/lower?

The RF for the new DC needs to be increased by 1 at a time. So it would take 3x time.
Will start a new run with adding 4 nodes in the new DC and RF=3

Sleep 30 minutes

Did we verify the new DC works as expected? i.e. with reads or writes?

Currently I get stress errors

2024-11-25 19:48:05.803: (CassandraStressEvent Severity.CRITICAL) period_type=end event_id=75fcf4f7-7d00-4677-8600-56c7d38c9c78 duration=30m5s: node=Node storage-utilization-master-loader-node-394cc2b3-1 [3.253.63.236 | 10.4.0.118] (dc name: eu-west-1)
stress_cmd=cassandra-stress read duration=30m -rate threads=16 "throttle=1400/s" -mode cql3 native -pop seq=1..5000000 -col "size=FIXED(10240) n=FIXED(1)" -schema "replication(strategy=NetworkTopologyStrategy,replication_factor=3)"
errors:
Stress command completed with bad status 1: Failed to connect over JMX; not collecting these stats
java.lang.RuntimeException: Failed to execute stress action

2024-11-25 19:48:03.845: (CassandraStressLogEvent Severity.ERROR) period_type=one-time event_id=75fcf4f7-7d00-4677-8600-56c7d38c9c78: type=IOException regex=java\.io\.IOException line_number=1912 node=Node storage-utilization-master-loader-node-394cc2b3-1 [3.253.63.236 | 10.4.0.118] (dc name: eu-west-1)
java.io.IOException: Operation x0 on key(s) [4b3132355032384c4b30]: Data returned was not validated

I think the stress command needs to be updated because of the new dc.

But in add_new_dc.py the commands are

"cassandra-stress read cl=LOCAL_QUORUM duration=20m -mode cql3 native -rate threads=8 -pop seq=1..20900 -col 'n=FIXED(10) size=FIXED(512)' -log interval=5",
"cassandra-stress write cl=LOCAL_QUORUM duration=20m -mode cql3 native -rate threads=8 -pop seq=1..20900 -col 'n=FIXED(10) size=FIXED(512)' -log interval=5"

without any mention of replication, and I don't know exactly what the difference is.

pehala · 2024-11-26T11:53:06Z

@Lakshmipathi any idea whats wrong?

Lakshmipathi · 2024-11-26T13:42:20Z

@pehala I'm not quite sure why stopped working with new-dc. Searching existing issues, came across this one scylladb/cassandra-stress#16

Lakshmipathi · 2024-11-27T04:24:52Z

@cezarmoise , can you share the jenkins link for this error: I got similar with my simple scaleout run (https://jenkins.scylladb.com/job/scylla-staging/job/LakshmipathiGanapathi/job/byo-longevity-test/268/console)

2024-11-25 19:48:05.803: (CassandraStressEvent Severity.CRITICAL) period_type=end event_id=75fcf4f7-7d00-4677-8600-56c7d38c9c78 duration=30m5s: node=Node storage-utilization-master-loader-node-394cc2b3-1 [3.253.63.236 | 10.4.0.118] (dc name: eu-west-1)
stress_cmd=cassandra-stress read duration=30m -rate threads=16 "throttle=1400/s" -mode cql3 native -pop seq=1..5000000 -col "size=FIXED(10240) n=FIXED(1)" -schema "replication(strategy=NetworkTopologyStrategy,replication_factor=3)"
errors:
Stress command completed with bad status 1: Failed to connect over JMX; not collecting these stats
java.lang.RuntimeException: Failed to execute stress action

cezarmoise · 2024-11-27T09:28:55Z

@cezarmoise , can you share the jenkins link for this error: I got similar with my simple scaleout run (https://jenkins.scylladb.com/job/scylla-staging/job/LakshmipathiGanapathi/job/byo-longevity-test/268/console)

2024-11-25 19:48:05.803: (CassandraStressEvent Severity.CRITICAL) period_type=end event_id=75fcf4f7-7d00-4677-8600-56c7d38c9c78 duration=30m5s: node=Node storage-utilization-master-loader-node-394cc2b3-1 [3.253.63.236 | 10.4.0.118] (dc name: eu-west-1)
stress_cmd=cassandra-stress read duration=30m -rate threads=16 "throttle=1400/s" -mode cql3 native -pop seq=1..5000000 -col "size=FIXED(10240) n=FIXED(1)" -schema "replication(strategy=NetworkTopologyStrategy,replication_factor=3)"
errors:
Stress command completed with bad status 1: Failed to connect over JMX; not collecting these stats
java.lang.RuntimeException: Failed to execute stress action

https://jenkins.scylladb.com/job/scylla-staging/job/cezar/job/byo-longevity-test/97/
https://argus.scylladb.com/tests/scylla-cluster-tests/394cc2b3-4719-4034-b93e-38e904335565

The Jenkins link probably won't be around for very long, I run a lot of builds.
#97.txt

cezarmoise · 2024-11-27T09:38:49Z

@swasik @pehala

Initial Cluster: 4 x i4i.large
Write to 70%; Sleep 30 min
Write to 90%; Sleep 30 min
Writes have RF=3
Add 4 nodes in new DC: 4 x i4i.large
Update all keyspaces with replication dc1: 3, dc2: 3

order of operations here is
ks1 -> dc1: 3, dc2: 1
ks1 -> dc1: 3, dc2: 2
ks1 -> dc1: 3, dc2: 3
ks2 -> dc1: 3, dc2: 1
...

At this point I get out of space error
https://argus.scylladb.com/tests/scylla-cluster-tests/4b03ebc0-3cb3-42c9-b541-02cdfe736651
https://jenkins.scylladb.com/job/scylla-staging/job/cezar/job/byo-longevity-test/100/

22:59:45  < t:2024-11-26 20:59:35,791 f:file_logger.py  l:101  c:sdcm.sct_events.file_logger p:ERROR > 2024-11-26 20:59:35.780 <2024-11-26 20:59:35.700>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=3c9fae6f-f0f1-47ca-90cb-1a3b14d7eb55: type=NO_SPACE_ERROR regex=No space left on device line_number=4972 node=storage-utilization-master-db-node-4b03ebc0-7
22:59:45 
 < t:2024-11-26 20:59:35,791 f:file_logger.py  l:101  c:sdcm.sct_events.file_logger p:ERROR > 2024-11-26T20:59:35.700+00:00 storage-utilization-master-db-node-4b03ebc0-7      !ERR | scylla[5396]:  [shard 0:strm] storage_service - Shutting down communications due to I/O errors until operator intervention: Disk error: std::system_error (error system:28, No space left on device)

This happened after

ALTER KEYSPACE keyspace_large3 WITH replication = {'class': 'NetworkTopologyStrategy', 'eu-westscylla_node_west': 3, 'eu-west-2scylla_node_west': 3}

When inserting data in the original DC, after keyspace_large3, it was only at 60% capacity.

After that There are a lot of erros like this

2024-11-26 21:32:39.988 <2024-11-26 21:32:39.943>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=1002f7e2-fa1c-4027-9151-e6f32428e106: type=RUNTIME_ERROR regex=std::runtime_error line_number=114732 node=storage-utilization-master-db-node-4b03ebc0-1
2024-11-26T21:32:39.943+00:00 storage-utilization-master-db-node-4b03ebc0-1      !ERR | scylla[5512]:  [shard 0: gms] raft_topology - topology change coordinator fiber got error std::runtime_error (raft topology: exec_global_command(barrier) failed with seastar::rpc::closed_error (connection is closed))

2024-11-26 21:32:39.985 <2024-11-26 21:32:39.943>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=1002f7e2-fa1c-4027-9151-e6f32428e106: type=RUNTIME_ERROR regex=std::runtime_error line_number=114724 node=storage-utilization-master-db-node-4b03ebc0-1
2024-11-26T21:32:39.943+00:00 storage-utilization-master-db-node-4b03ebc0-1      !ERR | scylla[5512]:  [shard 0: gms] raft_topology - drain rpc failed, proceed to fence old writes: std::runtime_error (raft topology: exec_global_command(barrier_and_drain) failed with seastar::rpc::closed_error (connection is closed))

swasik · 2024-11-27T11:41:47Z

Initial Cluster: 4 x i4i.large
Write to 70%; Sleep 30 min
Write to 90%; Sleep 30 min
Writes have RF=3
Add 3 nodes in new DC: 3 x i4i.large

Is not it expected to get out of space? If we have 4 nodes x 0.9 utilization and want to make the same number of replicas using just 3 nodes?

cezarmoise · 2024-11-27T13:47:26Z

Initial Cluster: 4 x i4i.large
Write to 70%; Sleep 30 min
Write to 90%; Sleep 30 min
Writes have RF=3
Add 3 nodes in new DC: 3 x i4i.large

Is not it expected to get out of space? If we have 4 nodes x 0.9 utilization and want to make the same number of replicas using just 3 nodes?

My mistake. It should say 4 new nodes. In the graph you can see there are 4 new lines

cezarmoise · 2024-11-27T13:48:08Z

I will run this again, but with wait_for_tablets_balanced calls in between.

swasik · 2024-11-27T13:52:38Z

I will run this again, but with wait_for_tablets_balanced calls in between.

Between which operations? I am not sure if this is a right approach - the customer is not expected to wait for balancing to finish before scaling DCs.

@bhalevy could you take a look while we can get out of space error here? Or recommend who we should ask?

cezarmoise · 2024-12-02T09:42:16Z

Managed to reproduce the failures. This time, after altering each keyspace I waited for tablets balance.

https://argus.scylladb.com/tests/scylla-cluster-tests/a318f810-3ae5-4912-9605-21434e3be97f

https://argus.scylladb.com/tests/scylla-cluster-tests/6c7cff0a-5fab-482c-8fd7-21499fe35a0e

swasik · 2024-12-02T12:24:03Z

@cezarmoise could you create a separate issue describing the bug?

pehala · 2024-12-09T07:23:18Z

Updated the description & name to match with changes to the test plan

cezarmoise · 2024-12-09T13:02:00Z

Opened scylladb/scylladb#21848 for the out of space issue

cezarmoise · 2024-12-11T10:04:30Z

To get it to work I had to make the instances of the new DC much larger than the one in the old DC, but the test did not wait enough, after compaction the used space is the same, but not all tables had time.

pehala mentioned this issue Nov 7, 2024

Add 90% storage utilization tests #9129

Open

13 tasks

github-actions bot assigned pehala Nov 7, 2024

pehala added the area/elastic cloud Issues related to the elastic cloud project label Nov 7, 2024

pehala assigned Lakshmipathi and unassigned pehala Nov 7, 2024

pehala added area/tablets area/serverlessv2 labels Nov 11, 2024

cezarmoise assigned cezarmoise and unassigned Lakshmipathi Nov 18, 2024

cezarmoise mentioned this issue Nov 20, 2024

full_storage_utilization_test: reclaim and scale out tests #9305

Draft

8 tasks

pehala added the P3 Medium Priority label Nov 21, 2024

dani-tweig removed the area/serverlessv2 label Nov 26, 2024

pehala changed the title ~~Add testcase for adding additional DC while having 90% storage utilization~~ Add multi-dc testcase for 90% storage utilization Dec 9, 2024

This was referenced Dec 9, 2024

Add testcase for scaling in DCs while having 90% utilization #9311

Closed

Add testcase for multidc scaleout with 90% utilization in parallel #9310

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multi-dc testcase for 90% storage utilization #9157

Add multi-dc testcase for 90% storage utilization #9157

pehala commented Nov 7, 2024 •

edited

Loading

cezarmoise commented Nov 21, 2024

cezarmoise commented Nov 22, 2024

cezarmoise commented Nov 22, 2024

cezarmoise commented Nov 22, 2024 •

edited

Loading

cezarmoise commented Nov 22, 2024

cezarmoise commented Nov 25, 2024

cezarmoise commented Nov 26, 2024

pehala commented Nov 26, 2024

cezarmoise commented Nov 26, 2024

pehala commented Nov 26, 2024

Lakshmipathi commented Nov 26, 2024

Lakshmipathi commented Nov 27, 2024 •

edited

Loading

cezarmoise commented Nov 27, 2024 •

edited

Loading

cezarmoise commented Nov 27, 2024 •

edited

Loading

swasik commented Nov 27, 2024

cezarmoise commented Nov 27, 2024

cezarmoise commented Nov 27, 2024

swasik commented Nov 27, 2024

cezarmoise commented Dec 2, 2024

swasik commented Dec 2, 2024

pehala commented Dec 9, 2024

cezarmoise commented Dec 9, 2024

cezarmoise commented Dec 11, 2024

Add multi-dc testcase for 90% storage utilization #9157

Add multi-dc testcase for 90% storage utilization #9157

Comments

pehala commented Nov 7, 2024 • edited Loading

cezarmoise commented Nov 21, 2024

cezarmoise commented Nov 22, 2024

cezarmoise commented Nov 22, 2024

cezarmoise commented Nov 22, 2024 • edited Loading

cezarmoise commented Nov 22, 2024

cezarmoise commented Nov 25, 2024

cezarmoise commented Nov 26, 2024

pehala commented Nov 26, 2024

cezarmoise commented Nov 26, 2024

pehala commented Nov 26, 2024

Lakshmipathi commented Nov 26, 2024

Lakshmipathi commented Nov 27, 2024 • edited Loading

cezarmoise commented Nov 27, 2024 • edited Loading

cezarmoise commented Nov 27, 2024 • edited Loading

swasik commented Nov 27, 2024

cezarmoise commented Nov 27, 2024

cezarmoise commented Nov 27, 2024

swasik commented Nov 27, 2024

cezarmoise commented Dec 2, 2024

swasik commented Dec 2, 2024

pehala commented Dec 9, 2024

cezarmoise commented Dec 9, 2024

cezarmoise commented Dec 11, 2024

pehala commented Nov 7, 2024 •

edited

Loading

cezarmoise commented Nov 22, 2024 •

edited

Loading

Lakshmipathi commented Nov 27, 2024 •

edited

Loading

cezarmoise commented Nov 27, 2024 •

edited

Loading

cezarmoise commented Nov 27, 2024 •

edited

Loading