-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raft: consolidate all append message sending #124006
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 8 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @pav-kv)
pkg/raft/raft.go
line 555 at r1 (raw file):
// log has been compacted) it can send a MsgSnap. // // In some cases, the MsgApp message can have zero entries, and yet being sent.
s/being/be/
pkg/raft/raft.go
line 1461 at r1 (raw file):
case pb.MsgHeartbeatResp: pr.RecentActive = true pr.PauseMsgAppProbes(false)
How will we want to handle this in a world where a raft group is not performing any heartbeats? Should the leader's tick loop periodically check which followers are connected (using e.g. store liveness) and call r.maybeSendAppend
on those followers?
pkg/raft/tracker/progress.go
line 182 at r1 (raw file):
// the passed-in bool. func (pr *Progress) PauseMsgAppProbes(pause bool) { pr.MsgAppFlowPaused = pause
If we are changing the meaning of this flag to only refer to empty MsgApps, should we rename it to MsgAppProbesPaused
?
pkg/raft/tracker/progress.go
line 299 at r1 (raw file):
// // In StateSnapshot, we do not send append messages. func (pr *Progress) ShouldSendMsgApp(last, commit uint64) bool {
I'm finding the relationship between the CanSendEntries
, ShouldSendMsgApp
, and IsPaused
predicates to be difficult to understand. ShouldSendMsgApp
and IsPaused
are close to inverses of each other, but they're not exactly inverses.
Are we intending to phase out IsPaused
?
If not, can one be written in terms of the other to make the relationship more clear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @kvoli, and @nvanbenschoten)
pkg/raft/raft.go
line 1461 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
How will we want to handle this in a world where a raft group is not performing any heartbeats? Should the leader's tick loop periodically check which followers are connected (using e.g. store liveness) and call
r.maybeSendAppend
on those followers?
Will we still have the tick loop? I guess so, since there are some things still depending on it (including some timeouts, like the reproposal threshold, expressed as tick counts).
The way it works now: in a throttled state there is a maybeSendAppend
roughly once per heartbeat interval (if the heartbeat responses indicate that the follower is connected).
To mimic this behaviour, we need to communicate the "peer is connected" bit to the MsgApp flow loop. We can use the liveness fabric to get this signal.
If we retain the tick loop, then we would be sending a MsgApp once in a few ticks. If we don't retain the tick loop, we need something else to schedule these periodic MsgApp heartbeats. For example, we can replace the tick loop with a MsgApp heartbeat loop, or tie this to some other periodic "scan all ranges" kind of job.
pkg/raft/tracker/progress.go
line 182 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
If we are changing the meaning of this flag to only refer to empty MsgApps, should we rename it to
MsgAppProbesPaused
?
Yes, will add commit etcd-io/raft@91981c3 to this PR soon.
pkg/raft/tracker/progress.go
line 299 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I'm finding the relationship between the
CanSendEntries
,ShouldSendMsgApp
, andIsPaused
predicates to be difficult to understand.ShouldSendMsgApp
andIsPaused
are close to inverses of each other, but they're not exactly inverses.Are we intending to phase out
IsPaused
?If not, can one be written in terms of the other to make the relationship more clear?
IsPaused
is legacy. I would remove it here, but it looks like we're using it for replica pausing, so I made it return the same value as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 8 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @kvoli, @nvanbenschoten, and @pav-kv)
pkg/raft/raft.go
line 555 at r1 (raw file):
// log has been compacted) it can send a MsgSnap. // // In some cases, the MsgApp message can have zero entries, and yet being sent.
nit: ... yet be sent
pkg/raft/raft.go
line 557 at r1 (raw file):
// In some cases, the MsgApp message can have zero entries, and yet being sent. // When the follower log is not fully up-to-date, we must send a MsgApp // periodically so that eventually the flow is either accepted or rejected. Not
basic question: is this the built-in retrying inside raft to account for a lossy network?
pkg/raft/raft.go
line 1461 at r1 (raw file):
where a raft group is not performing any heartbeats
Adding to this, in the future, if the higher layer has only asked for indices up to i
for a peer, so the raft leader knows that only ...,i]
has been sent, and that peer has acked, so Match == i
, there will be no retrying of MsgApp
by raft, yes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @kvoli, @nvanbenschoten, and @sumeerbhola)
pkg/raft/raft.go
line 557 at r1 (raw file):
Previously, sumeerbhola wrote…
basic question: is this the built-in retrying inside raft to account for a lossy network?
Yes. If there are any unacked entries, raft will optimistically send some, and then periodically send an empty MsgApp (roughly once per heartbeat interval currently). Eventually, it will get a response, adjust the flow state, and continue the loop.
pkg/raft/raft.go
line 1461 at r1 (raw file):
Previously, sumeerbhola wrote…
where a raft group is not performing any heartbeats
Adding to this, in the future, if the higher layer has only asked for indices up to
i
for a peer, so the raft leader knows that only...,i]
has been sent, and that peer has acked, soMatch == i
, there will be no retrying ofMsgApp
by raft, yes?
There will be no retrying of entries <= Match
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @kvoli, @nvanbenschoten, and @pav-kv)
pkg/raft/raft.go
line 1461 at r1 (raw file):
Previously, pav-kv (Pavel Kalinnikov) wrote…
There will be no retrying of entries <=
Match
.
My question phrasing was not great. I realize there isn't a need to resend entries <= Match, by definition of Match, but what I meant to ask is that since the higher layer has only asked up to i
, the retrying logic will ignore the indices > i
.
95e61b3
to
85ca9b8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @kvoli, @nvanbenschoten, and @sumeerbhola)
pkg/raft/raft.go
line 1461 at r1 (raw file):
Previously, sumeerbhola wrote…
My question phrasing was not great. I realize there isn't a need to resend entries <= Match, by definition of Match, but what I meant to ask is that since the higher layer has only asked up to
i
, the retrying logic will ignore the indices >i
.
Discussed offline. The MsgApp
"retries" will not include any entries if we are in StateReplicate
. These messages will have Entries = nil
appended after {index = Next-1, term = log.term(Next-1)}
.
pkg/raft/tracker/progress.go
line 182 at r1 (raw file):
Previously, pav-kv (Pavel Kalinnikov) wrote…
Yes, will add commit etcd-io/raft@91981c3 to this PR soon.
Done
pkg/raft/tracker/progress.go
line 299 at r1 (raw file):
Previously, pav-kv (Pavel Kalinnikov) wrote…
IsPaused
is legacy. I would remove it here, but it looks like we're using it for replica pausing, so I made it return the same value as before.
I made the relationship more clear: ShouldSendMsgApp
now depends on IsPaused
. PTAL.
35eee35
to
3b7c2ed
Compare
@nvanbenschoten @sumeerbhola Addressed your comments. PTAL. |
ba6b7c8
to
2b3ab55
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 8 files at r4, 2 of 2 files at r5, 8 of 8 files at r6, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @kvoli, @pav-kv, and @sumeerbhola)
pkg/raft/tracker/progress.go
line 299 at r1 (raw file):
Previously, pav-kv (Pavel Kalinnikov) wrote…
I made the relationship more clear:
ShouldSendMsgApp
now depends onIsPaused
. PTAL.
I'm not sure that this is an improvement. We're still referencing MsgAppFlowPaused
is two places and now we're inlining CanSendEntries
as the pr.Next <= last
.
Could this have been expressed as (with some of the improved commentary):
return pr.CanBumpCommit(commit) ||
pr.Match < last && (!pr.IsPaused() || pr.CanSendEntries(last))
pkg/raft/tracker/progress.go
line 100 at r6 (raw file):
or StateReplicate with saturated Inflights
Is this still true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @kvoli, and @sumeerbhola)
pkg/raft/tracker/progress.go
line 299 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I'm not sure that this is an improvement. We're still referencing
MsgAppFlowPaused
is two places and now we're inliningCanSendEntries
as thepr.Next <= last
.Could this have been expressed as (with some of the improved commentary):
return pr.CanBumpCommit(commit) || pr.Match < last && (!pr.IsPaused() || pr.CanSendEntries(last))
The expression you proposed not work (some tests fail). What works is the below (does it look clean?):
return pr.CanSendEntries(last) ||
pr.CanBumpCommit(commit) ||
pr.Match < last && !pr.MsgAppProbesPaused // need an empty "probe" MsgApp sometimes
But it doesn't depend on IsPaused
again. This is really subtle, and I don't know a good way to not use MsgAppProbesPaused
here.
This PR consolidates all decision-making about sending append messages into a single maybeSendAppend method. Previously, the behaviour depended on the sendIfEmpty flag which was set/unset depending on the context in which the method is called. This is unnecessary because the Progress struct contains enough information about the leader->follower flow state, so maybeSendAppend can be made stand-alone. In follow-up PRs, the consolidated maybeSendAppend method will be used to implement a more flexible message flow control. Epic: CRDB-37515 Release note: none
Epic: none Release note: none
2b3ab55
to
400b4b1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @kvoli, @nvanbenschoten, and @sumeerbhola)
pkg/raft/tracker/progress.go
line 100 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
or StateReplicate with saturated Inflights
Is this still true?
I think it is, see ShouldSendMsgApp
. More precisely, the cases when this flag matters are:
StateProbe
- throttle probes.StateReplicate
withInflights.Full()
- throttle emptyMsgApp
s.StateReplicate
withMatch < lastIndex && Next == lastIndex + 1
("everything is in flight"). In this case we throttle the emptyMsgApps
too, until we're sure everything is replicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 9 of 9 files at r7, 2 of 8 files at r8, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani, @pav-kv, and @sumeerbhola)
pkg/raft/tracker/progress.go
line 299 at r1 (raw file):
Previously, pav-kv (Pavel Kalinnikov) wrote…
The expression you proposed not work (some tests fail). What works is the below (does it look clean?):
return pr.CanSendEntries(last) || pr.CanBumpCommit(commit) || pr.Match < last && !pr.MsgAppProbesPaused // need an empty "probe" MsgApp sometimesBut it doesn't depend on
IsPaused
again. This is really subtle, and I don't know a good way to not useMsgAppProbesPaused
here.
This is cleaner to me, as it groups together real MsgApps, probe MsgApps, and commit-only MsgApps. Thanks for reworking it a few times.
The only nit I have (which you can ignore or address separately) is that you could extract pr.Match < last && !pr.MsgAppFlowPaused
into a Progress.ShouldSendProbe
method to package up that condition and centralize the commentary around it.
pkg/raft/tracker/progress.go
line 100 at r6 (raw file):
Previously, pav-kv (Pavel Kalinnikov) wrote…
I think it is, see
ShouldSendMsgApp
. More precisely, the cases when this flag matters are:
StateProbe
- throttle probes.StateReplicate
withInflights.Full()
- throttle emptyMsgApp
s.StateReplicate
withMatch < lastIndex && Next == lastIndex + 1
("everything is in flight"). In this case we throttle the emptyMsgApps
too, until we're sure everything is replicated.
Yeah I see, I had misread this as "is set to true" instead of "is used". The comment seems fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani, @nvanbenschoten, and @sumeerbhola)
pkg/raft/tracker/progress.go
line 299 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This is cleaner to me, as it groups together real MsgApps, probe MsgApps, and commit-only MsgApps. Thanks for reworking it a few times.
The only nit I have (which you can ignore or address separately) is that you could extract
pr.Match < last && !pr.MsgAppFlowPaused
into aProgress.ShouldSendProbe
method to package up that condition and centralize the commentary around it.
I'm thinking that maybe it would be better to return some enum or struct encapsulating what can be done, instead of having multiple semi-related Progress
methods that have to be called in a certain order.
In maybeSendAppend
, we call ShouldSendMsgApp
, and then call CanSendEntries
to determine (again, despite ShouldSendMsgApp
has already done it) whether some entries can be attached. These can be tied together.
Similarly, the Progress.ShouldSendProbe
method you described only makes sense in a context, after having ruled out the possibility of a "real" MsgApp
.
I'll think about it a bit more, and may clean this up further in future PRs when this gets closer to the outer API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani and @sumeerbhola)
pkg/raft/tracker/progress.go
line 299 at r1 (raw file):
Previously, pav-kv (Pavel Kalinnikov) wrote…
I'm thinking that maybe it would be better to return some enum or struct encapsulating what can be done, instead of having multiple semi-related
Progress
methods that have to be called in a certain order.In
maybeSendAppend
, we callShouldSendMsgApp
, and then callCanSendEntries
to determine (again, despiteShouldSendMsgApp
has already done it) whether some entries can be attached. These can be tied together.Similarly, the
Progress.ShouldSendProbe
method you described only makes sense in a context, after having ruled out the possibility of a "real"MsgApp
.I'll think about it a bit more, and may clean this up further in future PRs when this gets closer to the outer API.
👍
TFTRs! bors r=nvanbenschoten @arulajmani Feel free to add comments async, I will be iterating on this further and add you to future PR reviews. |
This PR consolidates all decision-making about sending append messages into a single
maybeSendAppend
method. Previously, the behaviour depended on thesendIfEmpty
flag which was set/unset depending on the context in which the method is called. This is unnecessary because theProgress
struct contains enough information about the leader->follower flow state, somaybeSendAppend
can be made stand-alone.In follow-up PRs, the consolidated
maybeSendAppend
method will be used to implement a more flexible message flow control.Ported from etcd-io/raft#134
Epic: CRDB-37515
Release note: none