Skip to content

Commit

Permalink
[#25408] DocDB: Fix MasterFailoverTestIndexCreation/MasterFailoverTes…
Browse files Browse the repository at this point in the history
…tIndexCreation.TestPauseAfterCreateIndexIssued/1 in TSAN

Summary:
There is an expected data race when we are trying to detect stuck RPC call.
But stuck call itself should not happen, particullary in this case it is caused by TSAN slowness, so call was not actually stuck.
Fixed by increasing stuck call detection threshold in TSAN.

Also fixed issue with double call to Transferred during connection shutdown.

Test Plan: ./yb_build.sh tsan --cxx-test integration-tests_master_failover-itest --gtest_filter MasterFailoverTestIndexCreation/MasterFailoverTestIndexCreation.TestPauseAfterCreateIndexIssued/1 -n 40 -- -p 6

Reviewers: hsunder

Reviewed By: hsunder

Subscribers: ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D40848
  • Loading branch information
spolitov committed Dec 23, 2024
1 parent 615ae5b commit 6299bad
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 8 deletions.
4 changes: 2 additions & 2 deletions src/yb/consensus/consensus_peers.cc
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ DEFINE_test_flag(int32, delay_removing_peer_with_failed_tablet_secs, 0,
"indicating that a tablet is in the FAILED state, and before marking this peer "
"as failed.");

DEFINE_RUNTIME_int32(consensus_stuck_peer_call_threshold_ms, 10000,
DEFINE_RUNTIME_int32(consensus_stuck_peer_call_threshold_ms, 10000 * yb::kTimeMultiplier,
"Time to wait after timeout before considering an RPC call as stuck.");
TAG_FLAG(consensus_stuck_peer_call_threshold_ms, advanced);

Expand Down Expand Up @@ -185,7 +185,7 @@ Status Peer::SignalRequest(RequestTriggerMode trigger_mode) {
auto last_rpc_start_time = last_rpc_start_time_.load(std::memory_order_acquire);
if (last_rpc_start_time != CoarseTimePoint::min() &&
now > last_rpc_start_time + stuck_threshold + timeout && !controller_.finished()) {
LOG_WITH_PREFIX(INFO) << Format(
LOG_WITH_PREFIX(DFATAL) << Format(
"Found an RPC call in stuck state - timeout: $0, last_rpc_start_time: $1, "
"stuck threshold: $2, force recover: $3, call state: $4",
timeout, last_rpc_start_time, stuck_threshold,
Expand Down
7 changes: 1 addition & 6 deletions src/yb/rpc/connection.cc
Original file line number Diff line number Diff line change
Expand Up @@ -413,12 +413,7 @@ Result<size_t> Connection::DoQueueOutboundData(OutboundDataPtr outbound_data, bo
}

if (!batch) {
s = OutboundQueued();
if (!s.ok()) {
outbound_data->Transferred(s, shared_from_this());
// The connection shutdown has already been triggered by OutboundQueued.
return s;
}
RETURN_NOT_OK(OutboundQueued());
}

return *result;
Expand Down

0 comments on commit 6299bad

Please sign in to comment.