Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raft: consolidate all append message sending #124006

Merged
merged 2 commits into from
May 29, 2024

Conversation

pav-kv
Copy link
Collaborator

@pav-kv pav-kv commented May 12, 2024

This PR consolidates all decision-making about sending append messages into a single maybeSendAppend method. Previously, the behaviour depended on the sendIfEmpty flag which was set/unset depending on the context in which the method is called. This is unnecessary because the Progress struct contains enough information about the leader->follower flow state, so maybeSendAppend can be made stand-alone.

In follow-up PRs, the consolidated maybeSendAppend method will be used to implement a more flexible message flow control.

Ported from etcd-io/raft#134

Epic: CRDB-37515
Release note: none

@pav-kv pav-kv requested a review from a team May 12, 2024 14:27
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 8 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @pav-kv)


pkg/raft/raft.go line 555 at r1 (raw file):

// log has been compacted) it can send a MsgSnap.
//
// In some cases, the MsgApp message can have zero entries, and yet being sent.

s/being/be/


pkg/raft/raft.go line 1461 at r1 (raw file):

	case pb.MsgHeartbeatResp:
		pr.RecentActive = true
		pr.PauseMsgAppProbes(false)

How will we want to handle this in a world where a raft group is not performing any heartbeats? Should the leader's tick loop periodically check which followers are connected (using e.g. store liveness) and call r.maybeSendAppend on those followers?


pkg/raft/tracker/progress.go line 182 at r1 (raw file):

// the passed-in bool.
func (pr *Progress) PauseMsgAppProbes(pause bool) {
	pr.MsgAppFlowPaused = pause

If we are changing the meaning of this flag to only refer to empty MsgApps, should we rename it to MsgAppProbesPaused?


pkg/raft/tracker/progress.go line 299 at r1 (raw file):

//
// In StateSnapshot, we do not send append messages.
func (pr *Progress) ShouldSendMsgApp(last, commit uint64) bool {

I'm finding the relationship between the CanSendEntries, ShouldSendMsgApp, and IsPaused predicates to be difficult to understand. ShouldSendMsgApp and IsPaused are close to inverses of each other, but they're not exactly inverses.

Are we intending to phase out IsPaused?

If not, can one be written in terms of the other to make the relationship more clear?

@andrewbaptist andrewbaptist requested a review from kvoli May 14, 2024 14:39
Copy link
Collaborator Author

@pav-kv pav-kv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @kvoli, and @nvanbenschoten)


pkg/raft/raft.go line 1461 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

How will we want to handle this in a world where a raft group is not performing any heartbeats? Should the leader's tick loop periodically check which followers are connected (using e.g. store liveness) and call r.maybeSendAppend on those followers?

Will we still have the tick loop? I guess so, since there are some things still depending on it (including some timeouts, like the reproposal threshold, expressed as tick counts).

The way it works now: in a throttled state there is a maybeSendAppend roughly once per heartbeat interval (if the heartbeat responses indicate that the follower is connected).

To mimic this behaviour, we need to communicate the "peer is connected" bit to the MsgApp flow loop. We can use the liveness fabric to get this signal.

If we retain the tick loop, then we would be sending a MsgApp once in a few ticks. If we don't retain the tick loop, we need something else to schedule these periodic MsgApp heartbeats. For example, we can replace the tick loop with a MsgApp heartbeat loop, or tie this to some other periodic "scan all ranges" kind of job.


pkg/raft/tracker/progress.go line 182 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

If we are changing the meaning of this flag to only refer to empty MsgApps, should we rename it to MsgAppProbesPaused?

Yes, will add commit etcd-io/raft@91981c3 to this PR soon.


pkg/raft/tracker/progress.go line 299 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I'm finding the relationship between the CanSendEntries, ShouldSendMsgApp, and IsPaused predicates to be difficult to understand. ShouldSendMsgApp and IsPaused are close to inverses of each other, but they're not exactly inverses.

Are we intending to phase out IsPaused?

If not, can one be written in terms of the other to make the relationship more clear?

IsPaused is legacy. I would remove it here, but it looks like we're using it for replica pausing, so I made it return the same value as before.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 8 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @kvoli, @nvanbenschoten, and @pav-kv)


pkg/raft/raft.go line 555 at r1 (raw file):

// log has been compacted) it can send a MsgSnap.
//
// In some cases, the MsgApp message can have zero entries, and yet being sent.

nit: ... yet be sent


pkg/raft/raft.go line 557 at r1 (raw file):

// In some cases, the MsgApp message can have zero entries, and yet being sent.
// When the follower log is not fully up-to-date, we must send a MsgApp
// periodically so that eventually the flow is either accepted or rejected. Not

basic question: is this the built-in retrying inside raft to account for a lossy network?


pkg/raft/raft.go line 1461 at r1 (raw file):

where a raft group is not performing any heartbeats

Adding to this, in the future, if the higher layer has only asked for indices up to i for a peer, so the raft leader knows that only ...,i] has been sent, and that peer has acked, so Match == i, there will be no retrying of MsgApp by raft, yes?

Copy link
Collaborator Author

@pav-kv pav-kv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @kvoli, @nvanbenschoten, and @sumeerbhola)


pkg/raft/raft.go line 557 at r1 (raw file):

Previously, sumeerbhola wrote…

basic question: is this the built-in retrying inside raft to account for a lossy network?

Yes. If there are any unacked entries, raft will optimistically send some, and then periodically send an empty MsgApp (roughly once per heartbeat interval currently). Eventually, it will get a response, adjust the flow state, and continue the loop.


pkg/raft/raft.go line 1461 at r1 (raw file):

Previously, sumeerbhola wrote…

where a raft group is not performing any heartbeats

Adding to this, in the future, if the higher layer has only asked for indices up to i for a peer, so the raft leader knows that only ...,i] has been sent, and that peer has acked, so Match == i, there will be no retrying of MsgApp by raft, yes?

There will be no retrying of entries <= Match.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @kvoli, @nvanbenschoten, and @pav-kv)


pkg/raft/raft.go line 1461 at r1 (raw file):

Previously, pav-kv (Pavel Kalinnikov) wrote…

There will be no retrying of entries <= Match.

My question phrasing was not great. I realize there isn't a need to resend entries <= Match, by definition of Match, but what I meant to ask is that since the higher layer has only asked up to i, the retrying logic will ignore the indices > i.

@pav-kv pav-kv force-pushed the raft-consolidate-append branch from 95e61b3 to 85ca9b8 Compare May 28, 2024 13:39
@pav-kv pav-kv requested a review from a team as a code owner May 28, 2024 13:44
@pav-kv pav-kv requested a review from sumeerbhola May 28, 2024 13:44
Copy link
Collaborator Author

@pav-kv pav-kv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @kvoli, @nvanbenschoten, and @sumeerbhola)


pkg/raft/raft.go line 1461 at r1 (raw file):

Previously, sumeerbhola wrote…

My question phrasing was not great. I realize there isn't a need to resend entries <= Match, by definition of Match, but what I meant to ask is that since the higher layer has only asked up to i, the retrying logic will ignore the indices > i.

Discussed offline. The MsgApp "retries" will not include any entries if we are in StateReplicate. These messages will have Entries = nil appended after {index = Next-1, term = log.term(Next-1)}.


pkg/raft/tracker/progress.go line 182 at r1 (raw file):

Previously, pav-kv (Pavel Kalinnikov) wrote…

Yes, will add commit etcd-io/raft@91981c3 to this PR soon.

Done


pkg/raft/tracker/progress.go line 299 at r1 (raw file):

Previously, pav-kv (Pavel Kalinnikov) wrote…

IsPaused is legacy. I would remove it here, but it looks like we're using it for replica pausing, so I made it return the same value as before.

I made the relationship more clear: ShouldSendMsgApp now depends on IsPaused. PTAL.

@pav-kv pav-kv force-pushed the raft-consolidate-append branch from 35eee35 to 3b7c2ed Compare May 28, 2024 13:45
@pav-kv
Copy link
Collaborator Author

pav-kv commented May 28, 2024

@nvanbenschoten @sumeerbhola Addressed your comments. PTAL.

@pav-kv pav-kv force-pushed the raft-consolidate-append branch 2 times, most recently from ba6b7c8 to 2b3ab55 Compare May 28, 2024 14:02
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 8 files at r4, 2 of 2 files at r5, 8 of 8 files at r6, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @kvoli, @pav-kv, and @sumeerbhola)


pkg/raft/tracker/progress.go line 299 at r1 (raw file):

Previously, pav-kv (Pavel Kalinnikov) wrote…

I made the relationship more clear: ShouldSendMsgApp now depends on IsPaused. PTAL.

I'm not sure that this is an improvement. We're still referencing MsgAppFlowPaused is two places and now we're inlining CanSendEntries as the pr.Next <= last.

Could this have been expressed as (with some of the improved commentary):

return pr.CanBumpCommit(commit) ||
        pr.Match < last && (!pr.IsPaused() || pr.CanSendEntries(last))

pkg/raft/tracker/progress.go line 100 at r6 (raw file):

or StateReplicate with saturated Inflights

Is this still true?

Copy link
Collaborator Author

@pav-kv pav-kv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @kvoli, and @sumeerbhola)


pkg/raft/tracker/progress.go line 299 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I'm not sure that this is an improvement. We're still referencing MsgAppFlowPaused is two places and now we're inlining CanSendEntries as the pr.Next <= last.

Could this have been expressed as (with some of the improved commentary):

return pr.CanBumpCommit(commit) ||
        pr.Match < last && (!pr.IsPaused() || pr.CanSendEntries(last))

The expression you proposed not work (some tests fail). What works is the below (does it look clean?):

return pr.CanSendEntries(last) ||
	pr.CanBumpCommit(commit) ||
	pr.Match < last && !pr.MsgAppProbesPaused // need an empty "probe" MsgApp sometimes

But it doesn't depend on IsPaused again. This is really subtle, and I don't know a good way to not use MsgAppProbesPaused here.

pav-kv added 2 commits May 28, 2024 17:44
This PR consolidates all decision-making about sending append messages
into a single maybeSendAppend method. Previously, the behaviour depended
on the sendIfEmpty flag which was set/unset depending on the context in
which the method is called. This is unnecessary because the Progress
struct contains enough information about the leader->follower flow
state, so maybeSendAppend can be made stand-alone.

In follow-up PRs, the consolidated maybeSendAppend method will be used
to implement a more flexible message flow control.

Epic: CRDB-37515
Release note: none
Epic: none
Release note: none
@pav-kv pav-kv force-pushed the raft-consolidate-append branch from 2b3ab55 to 400b4b1 Compare May 28, 2024 16:46
Copy link
Collaborator Author

@pav-kv pav-kv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @kvoli, @nvanbenschoten, and @sumeerbhola)


pkg/raft/tracker/progress.go line 100 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

or StateReplicate with saturated Inflights

Is this still true?

I think it is, see ShouldSendMsgApp. More precisely, the cases when this flag matters are:

  1. StateProbe - throttle probes.
  2. StateReplicate with Inflights.Full() - throttle empty MsgApps.
  3. StateReplicate with Match < lastIndex && Next == lastIndex + 1 ("everything is in flight"). In this case we throttle the empty MsgApps too, until we're sure everything is replicated.

@arulajmani arulajmani self-requested a review May 28, 2024 18:53
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 9 of 9 files at r7, 2 of 8 files at r8, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani, @pav-kv, and @sumeerbhola)


pkg/raft/tracker/progress.go line 299 at r1 (raw file):

Previously, pav-kv (Pavel Kalinnikov) wrote…

The expression you proposed not work (some tests fail). What works is the below (does it look clean?):

return pr.CanSendEntries(last) ||
	pr.CanBumpCommit(commit) ||
	pr.Match < last && !pr.MsgAppProbesPaused // need an empty "probe" MsgApp sometimes

But it doesn't depend on IsPaused again. This is really subtle, and I don't know a good way to not use MsgAppProbesPaused here.

This is cleaner to me, as it groups together real MsgApps, probe MsgApps, and commit-only MsgApps. Thanks for reworking it a few times.

The only nit I have (which you can ignore or address separately) is that you could extract pr.Match < last && !pr.MsgAppFlowPaused into a Progress.ShouldSendProbe method to package up that condition and centralize the commentary around it.


pkg/raft/tracker/progress.go line 100 at r6 (raw file):

Previously, pav-kv (Pavel Kalinnikov) wrote…

I think it is, see ShouldSendMsgApp. More precisely, the cases when this flag matters are:

  1. StateProbe - throttle probes.
  2. StateReplicate with Inflights.Full() - throttle empty MsgApps.
  3. StateReplicate with Match < lastIndex && Next == lastIndex + 1 ("everything is in flight"). In this case we throttle the empty MsgApps too, until we're sure everything is replicated.

Yeah I see, I had misread this as "is set to true" instead of "is used". The comment seems fine.

Copy link
Collaborator Author

@pav-kv pav-kv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani, @nvanbenschoten, and @sumeerbhola)


pkg/raft/tracker/progress.go line 299 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This is cleaner to me, as it groups together real MsgApps, probe MsgApps, and commit-only MsgApps. Thanks for reworking it a few times.

The only nit I have (which you can ignore or address separately) is that you could extract pr.Match < last && !pr.MsgAppFlowPaused into a Progress.ShouldSendProbe method to package up that condition and centralize the commentary around it.

I'm thinking that maybe it would be better to return some enum or struct encapsulating what can be done, instead of having multiple semi-related Progress methods that have to be called in a certain order.

In maybeSendAppend, we call ShouldSendMsgApp, and then call CanSendEntries to determine (again, despite ShouldSendMsgApp has already done it) whether some entries can be attached. These can be tied together.

Similarly, the Progress.ShouldSendProbe method you described only makes sense in a context, after having ruled out the possibility of a "real" MsgApp.

I'll think about it a bit more, and may clean this up further in future PRs when this gets closer to the outer API.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani and @sumeerbhola)


pkg/raft/tracker/progress.go line 299 at r1 (raw file):

Previously, pav-kv (Pavel Kalinnikov) wrote…

I'm thinking that maybe it would be better to return some enum or struct encapsulating what can be done, instead of having multiple semi-related Progress methods that have to be called in a certain order.

In maybeSendAppend, we call ShouldSendMsgApp, and then call CanSendEntries to determine (again, despite ShouldSendMsgApp has already done it) whether some entries can be attached. These can be tied together.

Similarly, the Progress.ShouldSendProbe method you described only makes sense in a context, after having ruled out the possibility of a "real" MsgApp.

I'll think about it a bit more, and may clean this up further in future PRs when this gets closer to the outer API.

👍

@pav-kv
Copy link
Collaborator Author

pav-kv commented May 29, 2024

TFTRs!

bors r=nvanbenschoten

@arulajmani Feel free to add comments async, I will be iterating on this further and add you to future PR reviews.

@craig craig bot merged commit 415d06c into cockroachdb:master May 29, 2024
22 checks passed
@pav-kv pav-kv deleted the raft-consolidate-append branch May 29, 2024 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants