Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backport 2024.2] fix(update_db_packages): fix issues with hanging threads during update #8082

Merged
merged 1 commit into from
Jul 22, 2024

Conversation

mergify[bot]
Copy link

@mergify mergify bot commented Jul 22, 2024

update_db_packages procedure executes updating of scylla packages in threads for each node and uses sdcm.utils.decorators.retrying mechanism for retries. Through this decorator it is allowed for UnexpectedExit exception to occur up to the specified number of times, when installing packages, then the UnexpectedExit exception should be reraised.

For some reason reraise doesn't return control back to the caller if update function is running in threading.Thread threads. So if for some reason a thread with packages update is continuously failing and all retries are exhausted, the thread hangs.

The change switches from threading.Thread to concurrent.futures.ThreadPoolExecutor for paralleling packages update on nodes. This approach doesn't have issues with reraising exceptions in threads.

Related tasks: #7162 and #7160

Testing

PR pre-checks (self review)

  • I added the relevant backport labels
  • I didn't leave commented-out/debugging code

Reminders

  • Add New configuration option and document them (in sdcm/sct_config.py)
  • Add unit tests to cover my changes (under unit-test/ folder)
  • Update the Readme/doc folder relevant to this change (if needed)

(cherry picked from commit 076bf83)

Parent PR: #7692

update_db_packages procedure executes updating of scylla packages in threads for each node
and uses sdcm.utils.decorators.retrying mechanism for retries. Through this decorator it is
allowed for UnexpectedExit exception to occur up to the specified number of times, when installing
packages, then the UnexpectedExit exception should be reraised.

For some reason reraise doesn't return control back to the caller if update function is running in
threading.Thread threads. In this case the thread where packages update is continously failing,
and all retries are exhausted, hangs until test times out.

The change switches from threading.Thread to concurrent.futures.ThreadPoolExecutor for
parallelising packages update on nodes, which doesn't have issues with reraising exceptions in
threads.

(cherry picked from commit 076bf83)
@mergify mergify bot assigned dimakr Jul 22, 2024
@fruch fruch merged commit 2c70d3d into branch-2024.2 Jul 22, 2024
7 checks passed
@mergify mergify bot deleted the mergify/bp/branch-2024.2/pr-7692 branch July 22, 2024 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants