-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add testcase for scale-out/scale-in while having 90% storage utilization #9156
Comments
older run
3-node (Instance type: i4i.large) cluster scaleout at 90%. reached 91% disk usage and started waiting for 30mins, no write or read.
After 30min idle time, started throttled write:
Scaleout by adding a new node at 90%
After 30mins, scaleout (3->4) cluster has disk usage at 75%, 74%, 75% and 70% Latency
https://argus.scylladb.com/tests/scylla-cluster-tests/c5de2f39-770c-4cf3-8d8c-66fef9d91d87 |
But I see that the chart presents average disk usage. This should change quickly as we are adding more disk space even if the new space is not used. Could you also add picture for maximal disk usage across all nodes? |
The interesting fact is that after migration we have the same number of tablets everywhere but on the new node the disk utilization is ca. 5% lower. Maybe something is not cleaned yet. Could we wait a bit more time to see if the utilization will be equal in the end? |
Started new job, with 1hr wait time just before the test ends. Will check and update whether 5% lower disk usage still exists or not. |
@swasik After scaleout, waited for 40mins and ensured there is 0% load on all nodes. Final disk usage is: 66%, 69%, 71% and 73%. So on avg, the newly added node has 5% less disk usage than other 3-nodes. |
Could it be due to tablet inbalance? |
I thought so too, but we have exactly the same number of tablets at each node and probably linear distribution of keyspace. |
@Lakshmipathi Do you have new updated results? I look at grafana for the first experiment and I have the following observations:
|
@paszkow, ok, I'm running test without throttle now, will update the results. |
Updated the description and the name to fit changes made to the test plan |
old run[Argus] (https://argus.scylladb.com/tests/scylla-cluster-tests/0c0af1c9-d798-48d1-bfaa-04767ef08d38) Configuration:
Workload during latency measurements:
Latency Decommission: |
Seems like compaction wont be triggered while truncating entire keyspace (as there is no tombstone created). SCT script waiting for |
Results[Argus] (https://argus.scylladb.com/tests/scylla-cluster-tests/f6516657-0726-4362-9495-afd6177bb0fd) Configuration:
Workload during latency measurements:
Latency |
@swasik The final decommission (4->3) shows significant increase in latencies, which is not acceptable, we need to investigate and see what is wrong. We also run this on |
Results
The text was updated successfully, but these errors were encountered: