Skip to content

Commit

Permalink
[BACKPORT 2.14][#18770] DocDB: Stricter memory order in PreparerImpl:…
Browse files Browse the repository at this point in the history
…:Run to prevent re-ordering of instructions

Summary:
Original commit: 8f9d006 / D28403
In the current implementation of Preparer, we could get into a state where we don't submit a new task of type `PreparerImpl::Run` assuming that the current running thread would take care of the newly inserted item. But the current thread running `PreparerImpl::Run` could return without processing the new item from the `queue_` IF the compiler/cpu ends up re-ordering the below two operations in `PreparerImpl::Run` (not the whole `if`, but just the conditional expression)
```
running_.store(false, std::memory_order_release);          <- 1
if (active_tasks_.load(std::memory_order_acquire)) ...    <- 2
```
There is no `memory_order_acq_rel` or a stricter ordering that prevents this possible re-ordering by the compiler/cpu. We suspect something similar might have happened in #18770, which led to a stuck write op at the leader.
Some context on the workload - a tablet hosts a single key, and all transactions try to update the same key. in a couple of runs, we could see that the workload stalls and all transactions fail with inability to obtain locks. we believe this is due to the stuck op at the leader (with acquired shared in-mem locks), and no current/scheduled execution of `PreparerImpl::Run`.

elaborating how we could get into the stuck state, consider a thread that is about to increment `active_tasks_` in `PreparerImpl::Submit` (call it Producer, P). Assuming the the thread that is currently running `PreparerImpl::Run`, executed `ProcessAndClearLeaderSideBatch()` and re-ordered 1 and 2 above, and executed 2 first (call it Consumer, C).

```
C reads active_tasks_ as 0                                                           <- 2
P increments active tasks, sees running_ is true, returns
C sets running_ to false                                                                <- 1
C evaluates if clause with previously read active_tasks_ value and decides to return
```
After this point, we end up in a state where there is an item to be processed, but no enqueued task. We obtained a core from one of the stuck universes, in which we could observe that `active_tasks_` is 1, `running_` is false, and there is no active current thread executing `PreparerImpl::Run`.

Changing it to the below would prevent any such possible re-ordering
```
running_.exchange(false, std::memory_order_acq_rel);
if (active_tasks_.load(std::memory_order_acquire) ...
```
since we use `memory_order_acq_rel`, and read/write after/before it cannot be re-ordered.
Jira: DB-7653

Test Plan: Jenkins

Reviewers: mbautin, rsami, sergei

Reviewed By: rsami

Subscribers: ybase, rthallam

Differential Revision: https://phorge.dev.yugabyte.com/D28591
  • Loading branch information
basavaraj29 committed Sep 18, 2023
1 parent adbbb45 commit 988f2fe
Showing 1 changed file with 21 additions and 9 deletions.
30 changes: 21 additions & 9 deletions src/yb/tablet/preparer.cc
Original file line number Diff line number Diff line change
Expand Up @@ -211,16 +211,28 @@ void PreparerImpl::Run() {
}
ProcessAndClearLeaderSideBatch();
std::unique_lock<std::mutex> stop_lock(stop_mtx_);
running_.store(false, std::memory_order_release);
// Check whether tasks were added while we were setting running to false.
if (active_tasks_.load(std::memory_order_acquire)) {
// Got more operations, try stay in the loop.
bool expected = false;
if (running_.compare_exchange_strong(expected, true, std::memory_order_acq_rel)) {
continue;
// If load of active_tasks_ below is re-ordered by the compiler/cpu and gets to execute before
// exchange of running_, we could get into a state where there is a new item enqueued on queue_
// but no newly submitted task of type PreparerImpl::Run. And the current thread executing
// PreparerImpl::Run could return as well which would put us in a "stuck" state.
//
// Use of acquire-release ordering for running_.exchange prevents any subsequent atomic
// operations in this thread from being re-ordered before the state of running_ is set to false,
// thus preventing the situation described above.
if (PREDICT_TRUE(running_.exchange(false, std::memory_order_acq_rel))) {
// Check whether tasks were added while we were switching running to false.
if (active_tasks_.load(std::memory_order_acquire)) {
// Got more operations, try stay in the loop.
bool expected = false;
if (running_.compare_exchange_strong(expected, true, std::memory_order_acq_rel)) {
continue;
}
// If someone else has flipped running_ to true, we can safely exit this function because
// another task is already submitted to the same token.
}
// If someone else has flipped running_ to true, we can safely exit this function because
// another task is already submitted to the same token.
} else {
LOG(DFATAL) << "running_ is false when there's an active thread executing PreparerImpl::Run. "
<< "Getting into this state may affect throughput or halt workloads.";
}
if (stop_requested_.load(std::memory_order_acquire)) {
VLOG(2) << "Prepare task's Run() function is returning because stop is requested.";
Expand Down

0 comments on commit 988f2fe

Please sign in to comment.