-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests: add WaitLeader function to common framework #14304
Conversation
1b62e29
to
1ce0f5f
Compare
|
||
// WaitMembersForLeader waits until given members agree on the same leader, | ||
// and returns its 'index' in the 'membs' list | ||
func (c *e2eCluster) WaitMembersForLeader(ctx context.Context, t testing.TB, membs []Member) int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you considered refactoring this and cluster.waitMembersForLeader into a common function to avoid code duplication?
Seams like you've changed some logic here and you aren't using m.Server.Lead(), is there a reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This idea of refracting has once crossed my mind. Then I realised that there is no equivalent function to m.Server.Lead()
in e2e cluster code, or something like m.Server.StopNotify()
.
e2e cluster has less control of details of a member. To get member ID in e2e cluster, membs[i].Client().Status()
is the way to go to my knowledge.
I did change some logic. For example, in integration cluster, waitMembersForLeader
select on m.Server.StopNotify()
to detect if an endpoint has stopped; in e2e cluster, if an endpoint has stopped, the test would fail due to network error in resp, err := membs[i].Client().Status()
. I haven't figured out how to deal with this error yet.
etcd/tests/framework/integration/cluster.go
Lines 455 to 470 in 9bb1342
for lead == 0 || !possibleLead[lead] { | |
lead = 0 | |
for _, m := range membs { | |
select { | |
case <-m.Server.StopNotify(): | |
continue | |
default: | |
} | |
if lead != 0 && lead != m.Server.Lead() { | |
lead = 0 | |
time.Sleep(10 * TickDuration) | |
break | |
} | |
lead = m.Server.Lead() | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are familiar with this part, please give me more advice. It would be helpful to me🫰
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@clarkfw thanks for your great work helping with the common framework testing. But here, I am also having a same question. With common framework, we expect to reduce code by removing duplication. I thought WaitLeader
function is being moved to common framework from integration/cluster.go, and then it can be used in both integration and e2e as you have explained in the #14305 ? cc @ahrtr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @spzala. Usually we should try to implement common functions/methods which can be used by both e2e and function test. But it seems that there is no easy way to implement a common waitMembersForLeader
, because e2e and integration depend on different solution to wait for the leader. I think it's OK for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@spzala FYI, commented about difficulty with reusing new impl in integration/cluster.go
one runs into circular imports when trying to call this new helper (part of 'framework' package) from framework/intergration/cluster.go (part of 'integration'), because 'framework' imports 'integration' here
There is probably a way to get this working, but it might involve changing existing packages/interfaces, so might be not worth the effort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI. #14328 (comment) cc @spzala @lavacat
I think this PR should be good to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ahrtr new implementation of WaitLeader is using only framework/interface.go interfaces. So, in principle it can be moved into a shared helper function. I was trying get a POC for this, but run into some circular imports issues. If there is interest in doing this, please open a new issue.
@lavacat Are you still working on the PoC? If yes, feel free to deliver a PR and we can discuss it in your PR. It's also OK to merge this PR firstly, then you can continue to work on your PoC/PR afterwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I didn't solve circular import problem. Feel free to merge this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1ce0f5f
to
5091062
Compare
Codecov Report
@@ Coverage Diff @@
## main #14304 +/- ##
=======================================
Coverage 75.63% 75.63%
=======================================
Files 457 457
Lines 37048 37048
=======================================
Hits 28020 28020
Misses 7288 7288
Partials 1740 1740
Flags with carried forward coverage won't be shown. Click here to find out more. 📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
Created amended version of your PR #14313 as an example. It's a draft to showcase possible solution. I'll close it out and we can keep working on this PR. The idea is to convert WaitLeader and WaitMembersForLeader to helper functions that use Cluster interface. @serathius this might be a good use case for you to review. How do we share implementation of methods that only use Cluster interface? Instead of helper methods we can use embeddings, but that will change your original interface. |
It's a neat solution in a more idiomatic way. I really like it :) |
5091062
to
b211a30
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@clarkfw thanks for your great work helping with the common framework testing. I have few inline comments.
tests/common/wait_leader_test.go
Outdated
} | ||
|
||
if lead1 == lead2 { | ||
t.Fatalf("leader did not change as expected after stopping memberts") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo memberts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've fixed the typo.
tests/common/wait_leader_test.go
Outdated
|
||
leader := clus.WaitLeader(t) | ||
if leader < 0 || leader >= len(clus.Members()) { | ||
t.Fatalf("failed to wait leader") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should provide the return value from WaitLeader in the message as an added help for debug, here and other places? Also, how about, for a better context, replace failed to wait leader
with WaitLeader failed
with the returned value as I mentioned above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good, WaitLeader failed
is a better for the log message. I've change the log message and it contains leader
value now.
tests/common/wait_leader_test.go
Outdated
|
||
lead1 := clus.WaitLeader(t) | ||
if lead1 < 0 || lead1 >= len(clus.Members()) { | ||
t.Fatalf("failed to wait leader") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar thought as above that we should print return value of WaitLeader in the log message. Also, Probably we should say WaitLeader failed for lead1
and similar for lead2
for better debugging.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error is unexpected, which means that it might be a potential bug. I think it's OK to just fail the test case.
Minor comment, there is no any parameter, so t.Fatal
should be used instead of t.Fatalf
. But the better way to get info included in the error message just as @spzala pointed out,
t.Fatalf("WaitLeader failed for the leader index (%d) is out of range, cluster member count: %d", len(clus.Members()))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks to @ahrtr for explaining very well
|
||
// WaitMembersForLeader waits until given members agree on the same leader, | ||
// and returns its 'index' in the 'membs' list | ||
func (c *e2eCluster) WaitMembersForLeader(ctx context.Context, t testing.TB, membs []Member) int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@clarkfw thanks for your great work helping with the common framework testing. But here, I am also having a same question. With common framework, we expect to reduce code by removing duplication. I thought WaitLeader
function is being moved to common framework from integration/cluster.go, and then it can be used in both integration and e2e as you have explained in the #14305 ? cc @ahrtr
b211a30
to
51c71c4
Compare
Signed-off-by: Clark <[email protected]>
d3ef5bb
to
a5409c6
Compare
|
||
// WaitMembersForLeader waits until given members agree on the same leader, | ||
// and returns its 'index' in the 'membs' list | ||
func (c *e2eCluster) WaitMembersForLeader(ctx context.Context, t testing.TB, membs []Member) int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lavacat @spzala @ahrtr thanks for reviewing and giving suggestions which have helped me a lot.
The WaitLeader
function in integration/cluster.go
depends on low level details which should only be accessible in integration/cluster.go
. That's why I implemented the WaitLeader
function in 'e2e.go'.
As you see, my PR did not reduce code by removing duplication. I think @lavacat did it by providing a more elegant implementation, which doesn't depend on low level details of integration cluster or e2e cluster. I'm gonna take a good look at that.
|
||
"go.etcd.io/etcd/client/pkg/v3/testutil" | ||
"go.etcd.io/etcd/tests/v3/framework/config" | ||
"go.etcd.io/etcd/tests/v3/framework/e2e" | ||
) | ||
|
||
const TickDuration = 10 * time.Millisecond |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comment: the TickDuration
is only used in e2e.go
, no need to expose it (the first letter uppercase).
I am thinking probably it makes more sense to just define one public const
to be used for both e2e and integration ( cluster.go#L72 ) . Of course, we can address this in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds good👍
#14344 was opened to address this.
@spzala This PR is good to go for me. Do you have any other comments or concerns? thx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Defer to @ahrtr Thanks @clarkfw for addressing my comments!
Thanks @spzala Great work! @clarkfw |
This PR relates to #14305