Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix github action e2e test. #555

Merged
merged 1 commit into from
Jan 26, 2024

Conversation

liangyuanpeng
Copy link
Contributor

@liangyuanpeng liangyuanpeng commented Jan 17, 2024

This PR fixes the problem of e2e github action.

The main changes are as follows:

- Fix wrong shell command
- The kind config used in github action e2e uses the same file as exmaples/kind/kind.config

  • Update kind version to v0.20.0
  • Fix wrong config of kind.
  • Temporary solution to flaky issue

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 17, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @liangyuanpeng. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jan 17, 2024
Copy link
Contributor Author

@liangyuanpeng liangyuanpeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the e2e test only runs on one k8s version. I also want e2e to run on multiple k8s versions at the same time, such as v1.27 v1.28 v1.29.

Maybe it can be completed in the next PR.

.github/workflows/e2e.yaml Show resolved Hide resolved
.github/workflows/e2e.yaml Show resolved Hide resolved
.github/workflows/e2e.yaml Show resolved Hide resolved
.github/workflows/e2e.yaml Outdated Show resolved Hide resolved
.github/workflows/e2e.yaml Outdated Show resolved Hide resolved
.github/workflows/e2e.yaml Outdated Show resolved Hide resolved
@jkh52
Copy link
Contributor

jkh52 commented Jan 22, 2024

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 22, 2024
@jkh52
Copy link
Contributor

jkh52 commented Jan 22, 2024

I think EGREE_SELECTOR_CONFIGURATION_PATH should be named EGRESS_SELECTOR_CONFIGURATION_PATH

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 24, 2024
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jan 24, 2024
@liangyuanpeng
Copy link
Contributor Author

liangyuanpeng commented Jan 24, 2024

I think EGREE_SELECTOR_CONFIGURATION_PATH should be named EGRESS_SELECTOR_CONFIGURATION_PATH

I have Revert it.

Is that near ready to merge?

...
pod/coredns-5d78c9869d-k9c2h condition met
pod/coredns-5d78c9869d-xbwc7 condition met
pod/test created
pod/test condition met
Error from server: Get "https://172.18.0.2:10250/containerLogs/default/test/test": No agent available
Error: Process completed with exit code 1.

There is a problem waiting to be solved. I will come back to work for it tomorrow, so it's not ready to merge it yet.

@jkh52

@jkh52
Copy link
Contributor

jkh52 commented Jan 24, 2024

@aojea and @liangyuanpeng, please note we temporarily disabled the action tests at #559. Hopefully this PR is able to make progress and then restore it. Let me know if you get stuck and want a second set of eyes on the "no agent available" problem.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 24, 2024
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 25, 2024
@liangyuanpeng
Copy link
Contributor Author

liangyuanpeng commented Jan 25, 2024

"no agent available"

This is a flaky problem when ipFamily is ipv4, I am checking.

Copy link
Contributor Author

@liangyuanpeng liangyuanpeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I will be on vacation next week, I hope to use a temporary method to merge this PR. After half a month of vacation, if the problem still needs to be handle, I can continue to look at it.

@aojea @jkh52 @tallclair PTAL,Thanks!

.github/workflows/e2e.yaml Outdated Show resolved Hide resolved
.github/workflows/e2e.yaml Outdated Show resolved Hide resolved
@liangyuanpeng liangyuanpeng force-pushed the fix_ghaction_e2e branch 2 times, most recently from ffb522e to 1d12f9f Compare January 25, 2024 09:40
@aojea
Copy link
Contributor

aojea commented Jan 25, 2024

/lgtm

/assign @jkh52

I still don't see this action running 🤔

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 25, 2024
@liangyuanpeng
Copy link
Contributor Author

liangyuanpeng commented Jan 25, 2024

I still don't see this action running 🤔

This workflow requires approval from a maintainer. Learn more about approving workflows.

Need the admin of this repo to approve this GitHub workflow. maybe @jkh52

Seems like it's always need the approve, Because this is not my first contribution, below is a sample screenshot.

image

/usr/local/bin/kubectl run test --image httpd:2
/usr/local/bin/kubectl wait --timeout=1m --for=condition=ready pods test
/usr/local/bin/kubectl get pods -A -owide
/usr/local/bin/kubectl wait --timeout=1m --for=condition=ready pods --namespace=kube-system -l k8s-app=konnectivity-agent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth also waiting for -l k8s-app=konnectivity-server to be ready.

(In this 1 control-plane node cluster is is nearly the same, but in a multi control-plane cluster it would be needed.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming you're talking about waiting the pod of ANP server with
readiness probe.(now, the pod of ANP server have not readiness probe)

konnectivity-server'readiness indicates that at least one Konnectivity Agent is connected.

konnectivity-agent'readiness indicates that the client is connected to at least one proxy server.

Therefore, if readiness is added at the same time, both will enter an infinite loop.

The reason is ANP agent is using kubernetes svc to connect ANP server and ANP server is running with daemonset.

Remind me if i missed something,Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming you're talking about waiting the pod of ANP server with readiness probe.(now, the pod of ANP server have not readiness probe)

konnectivity-server'readiness indicates that at least one Konnectivity Agent is connected.

konnectivity-agent'readiness indicates that the client is connected to at least one proxy server.

Yes to above.

Therefore, if readiness is added at the same time, both will enter an infinite loop.

I don't expect this, because waiting on agent (or any) readiness does not depend on control plane egress.

The reason is ANP agent is using kubernetes svc to connect ANP server and ANP server is running with daemonset.

Remind me if i missed something,Thanks.

What really matters (for a given proxy request ,like kubectl logs) is whether the apiserver that handles the request has a konnectivity-server with at least one useful agent. Since the agent readiness only covers "at least 1", it is insufficient in this case. (There was discussion on that feature to add agent readiness mode "connected to all servers" but it is not implemented.) This would be the gap, fixed by waiting for all konnectivity-server to be ready.

@jkh52
Copy link
Contributor

jkh52 commented Jan 25, 2024

/lgtm
/approve

Hold to allow Lan time to inspect logs, etc; feel free to un-hold when ready.
/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 25, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jkh52, liangyuanpeng

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 25, 2024
@liangyuanpeng
Copy link
Contributor Author

I quickly tested the multi control-plane cluster ( the pod of ANP server without readiness probe) ) scenario using the current configuration and concluded that the test would fail.

There are three ANP servers and two ANP agents. When kubectl logs is requested to ANP serverA, serverA does not have any agent to connect to it, so the request fails.

But the test scenario is necessary and I will create an issue for it later.

/hold cnacel

@liangyuanpeng
Copy link
Contributor Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 26, 2024
@k8s-ci-robot k8s-ci-robot merged commit 69f92d8 into kubernetes-sigs:master Jan 26, 2024
10 checks passed
@liangyuanpeng liangyuanpeng deleted the fix_ghaction_e2e branch January 26, 2024 06:26
@jkh52
Copy link
Contributor

jkh52 commented Jan 26, 2024

Hooray, nice coverage @liangyuanpeng and @aojea !

@aojea
Copy link
Contributor

aojea commented Jan 27, 2024

👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants