You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is caused by the not finished 2 write stress commands and 13 read ones: stress_cmd:
first 3 writes without duration finish in 2-3 hours itself
second 2 writes have duration part which gets overwritten with correct values
other 13 reads don't end not having the duration being set and, hence, not redefined
stress_read_cmd:
first 2 writes don't have duration, hence, not updated. Don't end because they are configured to have 20 and 26 iterations of really huge dataset writings. Designed to take ~4 days?
second 2 writes had duration, it was updated, it, finished correctly.
Here is the code that handles the duration update:
def run_stress_thread_bench(self, stress_cmd, duration=None, round_robin=False, stats_aggregate_cmds=True,
stop_test_on_failure=True, **_):
if duration:
timeout = self.get_duration(duration)
elif self._stress_duration and '-duration=' in stress_cmd:
timeout = self.get_duration(self._stress_duration)
stress_cmd = re.sub(r'\s-duration[=\s]+\d+[mhd]+\s*', f' -duration={self._stress_duration}m ', stress_cmd)
else:
timeout = get_timeout_from_stress_cmd(stress_cmd) or self.get_duration(duration)
So, as a result, we should either fix the duration overwriting logic to make it define the duration if it is absent or update all the affected test config files with the duration parameter for each of the stress commands which may run too long.
Impact
SCT stress commands don't finish before the test timeout.
How frequently does it reproduce?
100%
Installation details
Kernel Version: 5.15.0-1048-gcp
Scylla version (or git commit hash): 5.5.0~dev-20240119.b1ba904c4977 with build-id 7a5829efb1f6ef7b467d2dc837300abcc0b739c8
OS / Image: https://www.googleapis.com/compute/v1/projects/scylla-images/global/images/scylla-5-5-0-dev-x86-64-2024-01-20t02-19-13 (gce: undefined_region)
Test: longevity-large-partition-200k-pks-4days-gce-test
Test id: ea244d4e-60ba-40a2-8cf3-80b280fc98ba
Test name: scylla-master/longevity/longevity-large-partition-200k-pks-4days-gce-test
Test config file(s):
Issue description
The
longevity-large-partition-200k-pks-4days-gce-test
CI job has big set of stress commands:Then we set the
stress_duration
to have1440m
(1d
) value.As a result, the test run gets timed out exceeding the test time limit:
It is caused by the not finished 2 write stress commands and 13 read ones:
stress_cmd
:duration
part which gets overwritten with correct valuesduration
being set and, hence, not redefinedstress_read_cmd
:duration
, hence, not updated. Don't end because they are configured to have 20 and 26 iterations of really huge dataset writings. Designed to take ~4 days?duration
, it was updated, it, finished correctly.Here is the code that handles the
duration
update:So, as a result, we should either fix the
duration
overwriting logic to make it define theduration
if it is absent or update all the affected test config files with theduration
parameter for each of the stress commands which may run too long.Impact
SCT stress commands don't finish before the test timeout.
How frequently does it reproduce?
100%
Installation details
Kernel Version: 5.15.0-1048-gcp
Scylla version (or git commit hash):
5.5.0~dev-20240119.b1ba904c4977
with build-id7a5829efb1f6ef7b467d2dc837300abcc0b739c8
Cluster size: 5 nodes (n2-highmem-16)
Scylla Nodes used in this run:
OS / Image:
https://www.googleapis.com/compute/v1/projects/scylla-images/global/images/scylla-5-5-0-dev-x86-64-2024-01-20t02-19-13
(gce: undefined_region)Test:
longevity-large-partition-200k-pks-4days-gce-test
Test id:
ea244d4e-60ba-40a2-8cf3-80b280fc98ba
Test name:
scylla-master/longevity/longevity-large-partition-200k-pks-4days-gce-test
Test config file(s):
Logs and commands
$ hydra investigate show-monitor ea244d4e-60ba-40a2-8cf3-80b280fc98ba
$ hydra investigate show-logs ea244d4e-60ba-40a2-8cf3-80b280fc98ba
Logs:
Jenkins job URL
Argus
The text was updated successfully, but these errors were encountered: