Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cassandra stress thread timeout in get_results doesn't take soft/hard timeout into account #7209

Closed
fruch opened this issue Feb 15, 2024 · 1 comment
Assignees

Comments

@fruch
Copy link
Contributor

fruch commented Feb 15, 2024

Issue description

we have failures like that, when hitting scylladb/java-driver#258:

2024-02-14 18:53:01.572: (TestFrameworkEvent Severity.ERROR) period_type=one-time event_id=d3f8281c-d18c-421a-ac2b-ad0fa720d903, source=LongevityTest.test_custom_time (longevity_test.LongevityTest)() message=Traceback (most recent call last):
File "/home/ubuntu/scylla-cluster-tests/longevity_test.py", line 201, in test_custom_time
self.verify_stress_thread(cs_thread_pool=stress)
File "/home/ubuntu/scylla-cluster-tests/sdcm/tester.py", line 2093, in verify_stress_thread
results, errors = cs_thread_pool.verify_results()
File "/home/ubuntu/scylla-cluster-tests/sdcm/stress_thread.py", line 381, in verify_results
results = super().get_results()
File "/home/ubuntu/scylla-cluster-tests/sdcm/stress/base.py", line 88, in get_results
for future in concurrent.futures.as_completed(self.results_futures, timeout=timeout):
File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 241, in as_completed
raise TimeoutError(
concurrent.futures._base.TimeoutError: 2 (of 2) futures unfinished

we did add soft timeout on top of the remote call to execute the stress, and we grant 5% more to the soft timeout
and 10% more to the hard timeout, but the actual timeout attribute used in this call isn't get updated, and would stop the stress
command when hitting it

so basically the soft_timeout part doesn't work as we expect it

Packages

Scylla version: 2024.1.1-20240214.948e170d7a26 with build-id 68fe9e43934afe25726f1a9a075fed68d09889bd

Kernel Version: 5.15.0-1054-azure

Installation details

Cluster size: 6 nodes (Standard_L8s_v3)

Scylla Nodes used in this run:

  • longevity-10gb-3h-2024-1-db-node-4d371487-eastus-7 (52.191.15.173 | 10.0.0.14) (shards: 7)
  • longevity-10gb-3h-2024-1-db-node-4d371487-eastus-6 (13.90.145.47 | 10.0.0.10) (shards: 7)
  • longevity-10gb-3h-2024-1-db-node-4d371487-eastus-5 (13.90.144.59 | 10.0.0.9) (shards: 7)
  • longevity-10gb-3h-2024-1-db-node-4d371487-eastus-4 (13.90.144.37 | 10.0.0.8) (shards: 7)
  • longevity-10gb-3h-2024-1-db-node-4d371487-eastus-3 (13.90.147.193 | 10.0.0.7) (shards: 7)
  • longevity-10gb-3h-2024-1-db-node-4d371487-eastus-2 (52.170.218.38 | 10.0.0.6) (shards: 7)
  • longevity-10gb-3h-2024-1-db-node-4d371487-eastus-1 (13.90.147.134 | 10.0.0.5) (shards: 7)

OS / Image: /subscriptions/6c268694-47ab-43ab-b306-3c5514bc4112/resourceGroups/scylla-images/providers/Microsoft.Compute/images/scylla-2024.1.1-x86_64-2024-02-15T00-14-14 (azure: undefined_region)

Test: longevity-10gb-3h-azure-test
Test id: 4d371487-6236-4a47-981b-78f87df417f1
Test name: enterprise-2024.1/longevity/longevity-10gb-3h-azure-test
Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor 4d371487-6236-4a47-981b-78f87df417f1
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs 4d371487-6236-4a47-981b-78f87df417f1

Logs:

Jenkins job URL
Argus

@fruch fruch self-assigned this Feb 15, 2024
@fruch
Copy link
Contributor Author

fruch commented Mar 3, 2024

fixed in #7210

@fruch fruch closed this as completed Mar 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant