-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raft: consolidate all append message sending #134
Conversation
eeefed5
to
9cc9344
Compare
9cc9344
to
580374c
Compare
580374c
to
b4da150
Compare
raft.go
Outdated
Type: pb.MsgApp, | ||
Index: pr.Next - 1, | ||
LogTerm: prevTerm, | ||
Commit: r.raftLog.committed, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm noticing the odd asymmetry between MsgApp
sending the latest Commit index and the follower computing min(committed, lastnewi)
, and MsgHeartbeat
carefully sending min(pr.Match, r.raftLog.committed)
to avoid a follower-side panic. Do you have thoughts about that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The commit index in MsgApp
is "optimistic", and can be larger than pr.Match
. As a result, a follower can simultaneously append and commit entries. Technically, Commit
index can even exceed the last index of the entries in the batch. The way I think about MsgApp
is that it's effectively 2 independent messages:
- A logSlice of the leader log to append to the follower log.
- A
(term, commit)
is the point in the leader@term
log that is committed (see The notion of Term is overloaded #144 explaining the "leader log" coordinate system).
As such, even if the append (1) is redundant/no-op (because the follower already has these entries from elsewhere), the commit index (2) can still communicate valuable information - if the follower log is at the same term, it can advance the commit index safely. We don't take advantage of this currently (the commit index is advanced only within the bounds of the appended entries (1)), but we should.
The commit index in MsgHeartbeat
is "pessimistic" and cut at pr.Match
because currently the follower has a panic if the index is not in the bounds of the follower log. This pessimistic cut and panic are unnecessary though, see #138 and #139 aiming to relax it.
I would like to clean-up the semantics of the Commit
indices in both messages, to both be treated as (2) above - a (term, index)
that the leader sends without relying on the knowledge about the follower log. The follower log has enough knowledge to apply (or ignore) it safely.
693ed7b
to
2249aff
Compare
5cc41de
to
017bdda
Compare
017bdda
to
34d67b7
Compare
@ahrtr PTAL. Also asking @bdarnell to review (@nvanbenschoten and @erikgrinaker can't look at at it this week). |
Looks reasonable to me, although I'm very far removed from this code and I'm not sure I'm qualified to give it a thorough review. In particular I can see from the tests that there are behavior changes here, but I can't say for sure whether they're all safe. One thing this highlights for me is how the piggybacking of (unacknowledged) commit indexes onto MsgApp is a source of trouble. I've been tempted in the past to introduce something like a new pair of messages MsgCommitIndex/MsgCommitIndexResp so the leader always knows the commit index that the follower has acknowledged, and we can reduce some of the special cases around MsgApp. I don't know if the time is right for that but it's something to consider as we're improving flow control here. |
Will take a closer look this week. A generic comment, in order to make the review easier, suggest to breakdown this PR. Such as,
|
} | ||
pr.PauseMsgAppProbes(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that it changes the original logic. In previous implementation, if pr.State == StateReplicate && !pr.Inflights.Full()
is true
, then it will not pause the flow. Your new implementation will pause the flow in this case. Is it an intentional change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR modifies the meaning of the MsgAppFlowPaused
bool, but the visible behaviour of MsgApp
sends is not changed.
With this PR, the pr.Inflights.Full()
check just moved to a different place: see the CanSendEntries
and ShouldSendAppend
funcs.
It's not super intentional. I tried a few different ways, and didn't find a way that doesn't modify the meaning of this bool at all. This bool is not used by user code though, and we should unexport it IMO.
return pr.CanBumpCommit(commit) || | ||
pr.Match < last && (!pr.MsgAppProbesPaused || pr.CanSendEntries(last)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition is a little over complicated. Is it possible that both pr.MsgAppProbesPaused
and pr.CanSendEntries(last)
are true
? ShouldSendMsgApp
may still return true
in such case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's possible, we set MsgAppProbesPaused
after every message send for instance. MsgAppProbesPaused == true
means: if we are in the probing/throttled mode, don't send a message. But if we are not in this mode (CanSendEntries
is true) then MsgAppProbesPaused
is effectively ignored.
We could be keeping MsgAppProbesPaused
in sync with CanSendEntries
, but it would require some additional invariants.
1a2c70f
to
3a94256
Compare
3a94256
to
3f9b634
Compare
Signed-off-by: Pavel Kalinnikov <[email protected]>
This commit consolidates all decision-making about sending append messages into a single maybeSendAppend method. Previously, the behaviour depended on the sendIfEmpty flag which was set/unset depending on the circumstances in which the method is called. This is unnecessary because the Progress struct contains enough information about the leader->follower flow state, so maybeSendAppend can be made stand-alone. Signed-off-by: Pavel Kalinnikov <[email protected]>
Signed-off-by: Pavel Kalinnikov <[email protected]>
3f9b634
to
91981c3
Compare
124006: raft: consolidate all append message sending r=nvanbenschoten a=pav-kv This PR consolidates all decision-making about sending append messages into a single `maybeSendAppend` method. Previously, the behaviour depended on the `sendIfEmpty` flag which was set/unset depending on the context in which the method is called. This is unnecessary because the `Progress` struct contains enough information about the leader->follower flow state, so `maybeSendAppend` can be made stand-alone. In follow-up PRs, the consolidated `maybeSendAppend` method will be used to implement a more flexible message flow control. Ported from etcd-io/raft#134 Epic: CRDB-37515 Release note: none Co-authored-by: Pavel Kalinnikov <[email protected]>
Closing this PR, since it has been superseded by cockroachdb/cockroach#124006 in CRDB. |
This PR consolidates all decision-making about sending append messages into a single
maybeSendAppend
method. Previously, the behaviour depended on thesendIfEmpty
flag which was set/unset depending on the circumstances in which the method is called. This is unnecessary because theProgress
struct contains enough information about the leader->follower flow state, somaybeSendAppend
can be made stand-alone.In a follow-up PR, the consolidated
maybeSendAppend
method will be used to implement a more flexible message flow control.Part of #130