Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(scan_operations): add retry policy to cql query #9600

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

aleksbykov
Copy link
Contributor

@aleksbykov aleksbykov commented Dec 22, 2024

The node where scan operations was started could be
used by disruptive nemesis. If node was restarted/stopped
while scan query had been running, the scan operation would
be terminated and error event and message will mark
test as failed.

Add to cql session ExponentialBackoffRetryPolicy
which allow to retry the query, if node was down
and once it back, query will be succesfully finished

Fixes: #9284

Testing

PR pre-checks (self review)

  • I added the relevant backport labels
  • I didn't leave commented-out/debugging code

Reminders

  • Add New configuration option and document them (in sdcm/sct_config.py)
  • Add unit tests to cover my changes (under unit-test/ folder)
  • Update the Readme/doc folder relevant to this change (if needed)

The node where scan operations was started could be
used by disruptive nemesis. If node was restarted/stopped
while scan query had been running, the scan operation would
be terminated and error event and message will mark
test as failed.

Add to cql session ExponetionalBackoffRetryPolicy
which allow to retry the query, if node was down
and once it back, query will be succesfully finished

Fixes: scylladb#9284
@aleksbykov aleksbykov requested a review from fruch December 22, 2024 10:21
@aleksbykov aleksbykov added backport/6.2 backport/2024.2 Need backport to 2024.2 backport/6.1 Need backport to 6.1 labels Dec 22, 2024
@aleksbykov aleksbykov marked this pull request as ready for review December 23, 2024 02:54
@@ -460,6 +474,9 @@ def execute_query(self, session, cmd: str,
| FullPartitionScanReversedOrderEvent]) -> None:
self.log.debug('Will run command %s', cmd)
validate_mapreduce_service_requests_start_time = time.time()
session.cluster.default_retry_policy = ExponentialBackoffRetryPolicy(**self._exp_backoff_retry_policy_params)
session.default_timeout = self._session_execution_timeout
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a difference between self._request_default_timeout and self._request_default_timeout? maybe it could be reused?

@fruch
Copy link
Contributor

fruch commented Dec 23, 2024

this is a replacement for @temichus trials in #9370 ?

@@ -120,6 +125,8 @@ def execute_query(
| FullPartitionScanReversedOrderEvent]) -> ResultSet:
# pylint: disable=unused-argument
self.log.debug('Will run command %s', cmd)
session.cluster.default_retry_policy = ExponentialBackoffRetryPolicy(**self._exp_backoff_retry_policy_params)
Copy link
Contributor

@fruch fruch Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit weird it comes next to the code executing the query, and no the code creating the session.

I would recommend consolidating the session creating code into something like:

@property
def cql_connection(self, **kwargs):
    with self.fullscan_params.db_cluster.cql_connection_patient(
                    node=self.db_node,
                    user=self.fullscan_params.user,
                    password=self.fullscan_params.user_password, **kwargs) as session:
        session.cluster.default_retry_policy = ExponentialBackoffRetryPolicy(**self._exp_backoff_retry_policy_params)
        session.default_timeout = self._request_default_timeout
        yield session

there way too many repetitions of applying this retry, and it should be across the board for all of the sessions.

@temichus
Copy link
Contributor

this is a replacement for @temichus trials in #9370 ?

yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/6.1 Need backport to 6.1 backport/6.2 backport/2024.2 Need backport to 2024.2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix Fullscanoperation thread to choose only alive node
4 participants