Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: add WaitLeader function to common framework #14304

Merged
merged 1 commit into from
Aug 14, 2022

Conversation

clement2026
Copy link
Contributor

@clement2026 clement2026 commented Aug 3, 2022

This PR relates to #14305

@clement2026 clement2026 force-pushed the e2e-cluster-waitleader branch 2 times, most recently from 1b62e29 to 1ce0f5f Compare August 3, 2022 09:30

// WaitMembersForLeader waits until given members agree on the same leader,
// and returns its 'index' in the 'membs' list
func (c *e2eCluster) WaitMembersForLeader(ctx context.Context, t testing.TB, membs []Member) int {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you considered refactoring this and cluster.waitMembersForLeader into a common function to avoid code duplication?
Seams like you've changed some logic here and you aren't using m.Server.Lead(), is there a reason?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This idea of refracting has once crossed my mind. Then I realised that there is no equivalent function to m.Server.Lead() in e2e cluster code, or something like m.Server.StopNotify().

e2e cluster has less control of details of a member. To get member ID in e2e cluster, membs[i].Client().Status() is the way to go to my knowledge.

I did change some logic. For example, in integration cluster, waitMembersForLeader select on m.Server.StopNotify() to detect if an endpoint has stopped; in e2e cluster, if an endpoint has stopped, the test would fail due to network error in resp, err := membs[i].Client().Status(). I haven't figured out how to deal with this error yet.

for lead == 0 || !possibleLead[lead] {
lead = 0
for _, m := range membs {
select {
case <-m.Server.StopNotify():
continue
default:
}
if lead != 0 && lead != m.Server.Lead() {
lead = 0
time.Sleep(10 * TickDuration)
break
}
lead = m.Server.Lead()
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are familiar with this part, please give me more advice. It would be helpful to me🫰

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clarkfw thanks for your great work helping with the common framework testing. But here, I am also having a same question. With common framework, we expect to reduce code by removing duplication. I thought WaitLeader function is being moved to common framework from integration/cluster.go, and then it can be used in both integration and e2e as you have explained in the #14305 ? cc @ahrtr

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @spzala. Usually we should try to implement common functions/methods which can be used by both e2e and function test. But it seems that there is no easy way to implement a common waitMembersForLeader, because e2e and integration depend on different solution to wait for the leader. I think it's OK for now.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spzala FYI, commented about difficulty with reusing new impl in integration/cluster.go

one runs into circular imports when trying to call this new helper (part of 'framework' package) from framework/intergration/cluster.go (part of 'integration'), because 'framework' imports 'integration' here
There is probably a way to get this working, but it might involve changing existing packages/interfaces, so might be not worth the effort.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI. #14328 (comment) cc @spzala @lavacat

I think this PR should be good to go.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahrtr new implementation of WaitLeader is using only framework/interface.go interfaces. So, in principle it can be moved into a shared helper function. I was trying get a POC for this, but run into some circular imports issues. If there is interest in doing this, please open a new issue.

@lavacat Are you still working on the PoC? If yes, feel free to deliver a PR and we can discuss it in your PR. It's also OK to merge this PR firstly, then you can continue to work on your PoC/PR afterwards.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I didn't solve circular import problem. Feel free to merge this PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ahrtr @lavacat for the discussion. sgtm!

@clement2026 clement2026 force-pushed the e2e-cluster-waitleader branch from 1ce0f5f to 5091062 Compare August 4, 2022 04:42
@codecov-commenter
Copy link

codecov-commenter commented Aug 4, 2022

Codecov Report

Merging #14304 (b2726c4) into main (b2726c4) will not change coverage.
The diff coverage is n/a.

❗ Current head b2726c4 differs from pull request most recent head b211a30. Consider uploading reports for the commit b211a30 to get more accurate results

@@           Coverage Diff           @@
##             main   #14304   +/-   ##
=======================================
  Coverage   75.63%   75.63%           
=======================================
  Files         457      457           
  Lines       37048    37048           
=======================================
  Hits        28020    28020           
  Misses       7288     7288           
  Partials     1740     1740           
Flag Coverage Δ
all 75.63% <0.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@lavacat lavacat mentioned this pull request Aug 5, 2022
@lavacat
Copy link

lavacat commented Aug 5, 2022

Created amended version of your PR #14313 as an example. It's a draft to showcase possible solution. I'll close it out and we can keep working on this PR.

The idea is to convert WaitLeader and WaitMembersForLeader to helper functions that use Cluster interface.
In integration/cluster.go, integrationCluster can be converted and we can use the same helper.

@serathius this might be a good use case for you to review. How do we share implementation of methods that only use Cluster interface? Instead of helper methods we can use embeddings, but that will change your original interface.

@clement2026
Copy link
Contributor Author

Created amended version of your PR #14313 as an example. It's a draft to showcase possible solution. I'll close it out and we can keep working on this PR.

The idea is to convert WaitLeader and WaitMembersForLeader to helper functions that use Cluster interface. In integration/cluster.go, integrationCluster can be converted and we can use the same helper.

@serathius this might be a good use case for you to review. How do we share implementation of methods that only use Cluster interface? Instead of helper methods we can use embeddings, but that will change your original interface.

It's a neat solution in a more idiomatic way. I really like it :)

@serathius
Copy link
Member

serathius commented Aug 7, 2022

cc @ahrtr @spzala as I'm on vacation.

tests/framework/e2e/etcdctl.go Outdated Show resolved Hide resolved
tests/framework/e2e.go Show resolved Hide resolved
Copy link
Member

@ahrtr ahrtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thank you @clarkfw

cc @spzala @serathius @ptabor

Copy link
Member

@spzala spzala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clarkfw thanks for your great work helping with the common framework testing. I have few inline comments.

}

if lead1 == lead2 {
t.Fatalf("leader did not change as expected after stopping memberts")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo memberts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've fixed the typo.


leader := clus.WaitLeader(t)
if leader < 0 || leader >= len(clus.Members()) {
t.Fatalf("failed to wait leader")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should provide the return value from WaitLeader in the message as an added help for debug, here and other places? Also, how about, for a better context, replace failed to wait leader with WaitLeader failed with the returned value as I mentioned above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good, WaitLeader failed is a better for the log message. I've change the log message and it contains leader value now.


lead1 := clus.WaitLeader(t)
if lead1 < 0 || lead1 >= len(clus.Members()) {
t.Fatalf("failed to wait leader")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar thought as above that we should print return value of WaitLeader in the log message. Also, Probably we should say WaitLeader failed for lead1 and similar for lead2 for better debugging.?

Copy link
Member

@ahrtr ahrtr Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error is unexpected, which means that it might be a potential bug. I think it's OK to just fail the test case.

Minor comment, there is no any parameter, so t.Fatal should be used instead of t.Fatalf. But the better way to get info included in the error message just as @spzala pointed out,

t.Fatalf("WaitLeader failed for the leader index (%d) is out of range, cluster member count: %d", len(clus.Members()))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks to @ahrtr for explaining very well

tests/framework/e2e.go Show resolved Hide resolved

// WaitMembersForLeader waits until given members agree on the same leader,
// and returns its 'index' in the 'membs' list
func (c *e2eCluster) WaitMembersForLeader(ctx context.Context, t testing.TB, membs []Member) int {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clarkfw thanks for your great work helping with the common framework testing. But here, I am also having a same question. With common framework, we expect to reduce code by removing duplication. I thought WaitLeader function is being moved to common framework from integration/cluster.go, and then it can be used in both integration and e2e as you have explained in the #14305 ? cc @ahrtr

@clement2026 clement2026 force-pushed the e2e-cluster-waitleader branch from d3ef5bb to a5409c6 Compare August 11, 2022 04:46
tests/framework/e2e.go Show resolved Hide resolved

// WaitMembersForLeader waits until given members agree on the same leader,
// and returns its 'index' in the 'membs' list
func (c *e2eCluster) WaitMembersForLeader(ctx context.Context, t testing.TB, membs []Member) int {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lavacat @spzala @ahrtr thanks for reviewing and giving suggestions which have helped me a lot.

The WaitLeader function in integration/cluster.go depends on low level details which should only be accessible in integration/cluster.go. That's why I implemented the WaitLeader function in 'e2e.go'.

As you see, my PR did not reduce code by removing duplication. I think @lavacat did it by providing a more elegant implementation, which doesn't depend on low level details of integration cluster or e2e cluster. I'm gonna take a good look at that.


"go.etcd.io/etcd/client/pkg/v3/testutil"
"go.etcd.io/etcd/tests/v3/framework/config"
"go.etcd.io/etcd/tests/v3/framework/e2e"
)

const TickDuration = 10 * time.Millisecond
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment: the TickDuration is only used in e2e.go, no need to expose it (the first letter uppercase).

I am thinking probably it makes more sense to just define one public const to be used for both e2e and integration ( cluster.go#L72 ) . Of course, we can address this in a separate PR.

Copy link
Contributor Author

@clement2026 clement2026 Aug 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds good👍
#14344 was opened to address this.

@ahrtr
Copy link
Member

ahrtr commented Aug 13, 2022

@spzala This PR is good to go for me. Do you have any other comments or concerns? thx

Copy link
Member

@spzala spzala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm
Defer to @ahrtr Thanks @clarkfw for addressing my comments!

@ahrtr
Copy link
Member

ahrtr commented Aug 14, 2022

Thanks @spzala

Great work! @clarkfw

@ahrtr ahrtr merged commit 012fc51 into etcd-io:main Aug 14, 2022
@clement2026 clement2026 deleted the e2e-cluster-waitleader branch August 14, 2022 10:24
@clement2026
Copy link
Contributor Author

Thanks to @ahrtr and @spzala for your help! This PR won't have gone so well without your collaboration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

6 participants