Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(decommissionstreamerr): Set valid decommission nodetool timeout #7144

Merged
merged 1 commit into from
Jan 25, 2024

Conversation

aleksbykov
Copy link
Contributor

@aleksbykov aleksbykov commented Jan 25, 2024

Timeout for nodetool decommission was set incorrectly for 180sec in DecommissionStreamErr nemesis. Decommission process could run much longer. If decommission should be aborted after log message which expected to be at the end of decommission process, nodetool decommission command will be terminated by timeout too earlier and next logic of nemesis failed because status of decommissioning node will be UL because decommission itself continue to run on the node.

Job which failed: https://argus.scylladb.com/test/a5d1f97b-064a-40ed-a517-70e2092b51c2/runs?additionalRuns%5B%5D=38e1b036-3163-4f2a-92f4-5f66f3b0a116

Set valid waiting timeouts for commands and ParalleObject to correctly abort decommission.

Fix issue #7067

Testing

PR pre-checks (self review)

  • I added the relevant backport labels
  • I didn't leave commented-out/debugging code

Reminders

  • Add New configuration option and document them (in sdcm/sct_config.py)
  • Add unit tests to cover my changes (under unit-test/ folder)
  • Update the Readme/doc folder relevent to this change (if needed)

Timeout for nodetool decommission was set incorrectly for 180sec
in DecommissionStreamErr nemesis. Decommission process could run
much longer. If decommission should be aborted after log message
which expected to be at the end of decommission process,
nodetool decommission command will be terminated by timeout too earlier
and next logic of nemesis failed because status of decommissioning
node will be UL because decommission itself continue to run on the node.

Set valid waiting timeoutes for commands and ParalleObject to correctly
abort decommission.
Copy link
Contributor

@roydahan roydahan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Not sure why you chose addition of only 600s, but if needed we can set it to higher number.

Copy link
Contributor

@fruch fruch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@fruch fruch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fruch fruch merged commit f17ed34 into scylladb:master Jan 25, 2024
6 checks passed
@fruch fruch added backport/2024.1-done Commit backported to 2024.1 backport/5.4-done Commit backported to 5.4 backport/2023.1-done Commit backported to 2023.1 labels Jan 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/5.4-done Commit backported to 5.4 backport/5.4 Need backport to 5.4 backport/2023.1-done Commit backported to 2023.1 backport/2024.1-done Commit backported to 2024.1 backport/2024.1 Need backport to 2024.1 Ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants