DRA: consolidate pre-submit jobs #33942

pohly · 2024-12-13T07:21:29Z

This reduces the number of jobs which get triggered automatically to save costs and avoid noise in unrelated PRs. See kubernetes/community#8196 for a suggested policy and rationale.

The -kubetest2 variants were added as part of kubernetes#33550 without considering whether it makes sense to run them automatically. They shouldn't because that doesn't add test coverage and is expensive.

pohly · 2024-12-13T07:22:00Z

config/jobs/kubernetes/sig-node/sig-node-presubmit.yaml

@@ -4322,11 +4327,12 @@ presubmits:
          limits:
            cpu: 4
            memory: 6Gi
-  - name: pull-kubernetes-node-e2e-crio-cgrpv1-dra-kubetest2
+  - name: pull-kubernetes-node-e2e-crio-cgrpv1-dra-kubetest2 # experimental alternative to pull-kubernetes-node-e2e-crio-cgrpv1-dra


@elieser1101: what is the status of these kubetest2 variants?

As far as I can see they don't differ much from the correspondent kubernetes_e2e.py variants. I'd propose to drop kubernetes_e2e.py jobs.

At the moment the kubetest2 variants are not working because of the --label-filter not suported. I started to look at the fix on kubetests2 kubernetes-sigs/kubetest2#285

@elieser1101 Both kubetest2 jobs are "green" in the test grid:

https://testgrid.k8s.io/sig-node-cri-o#pr-node-kubelet-crio-cgrpv1-dra-kubetest2

https://testgrid.k8s.io/sig-node-cri-o#pr-node-kubelet-crio-cgrpv2-dra-kubetest2

Do you mean they're not working correctly / as expected?

yeah both are green but its because test are not actually runring, we can see that from the build.log

Ran 0 of 718 Specs in 0.023 seconds SUCCESS! -- 0 Passed | 0 Failed | 0 Pending | 718 Skipped PASS

pohly · 2024-12-13T07:23:02Z

config/jobs/kubernetes/sig-node/sig-node-presubmit.yaml

@@ -4276,7 +4278,10 @@ presubmits:
    skip_branches:
    - release-\d+\.\d+  # per-release image
    always_run: false
-    run_if_changed: /dra/|/dynamicresources/|/resourceclaim/|/resourceclass/|/podscheduling/|/resourceclaimtemplate/|/dynamic-resource-allocation/|/pkg/apis/resource/|/api/resource/|/test/e2e_node/dra_test.go
+    # Automatically testing with one container runtime is sufficient.
+    # containerd was picked because it's apparently more common.


@bart0sh: does choosing containerd make sense?

Which runtime is more stable in your experience?

cri-o just a little bit more stable lately, but it can and most probably will change if we drop one of the runtimes as you're suggesting.

@pohly

Automatically testing with one container runtime is sufficient.

Sufficient for what?

Can we keep this job as is and stop automatically triggering pull-kubernetes-node-e2e-crio-cgrpv1-dra instead?

cri-o just a little bit more stable lately, but it can and most probably will change if we drop one of the runtimes as you're suggesting.

Why would dropping testing of one runtime change how stable the other runtime is?

Sufficient for what?

Sufficient for knowing that kubelet's interaction with a runtime isn't completely broken.

Can we keep this job as is and stop automatically triggering pull-kubernetes-node-e2e-crio-cgrpv1-dra instead?

Sure, if that one was more stable in the past. The goal is to avoid flakes that aren't caused by our code changes, so whatever runtime is more stable is better.

When I added those jobs my idea was to cover all available runtimes and distros

That makes perfect sense for periodic testing. That's how we reach assurance that what we will release will work. But we don't need that level of confidence for every single PR that we merge.

But we don't need that level of confidence for every single PR that we merge.

What's wrong with having an ability to see a failure before merging the code causing it? And why we don't need to test it for containerd, cos and ubuntu, but need it for cri-o and fedora? Where is a border line between pr and ci jobs in this sense?

For pre-submit: what's wrong is that this causes real, significant costs for the projects without being justified by the benefit that it brings. We cannot afford to run everything because "there might be failures".

For periodic: here it is more reasonable to expand the scope of platform coverage because we control how often we test. But also here the question arises whether we expect problems on one platform that don't occur on another and whether it should be Kubernetes that tests that or the downstream consumer of Kubernetes. Even cloud providers are expected to run their own testing of Kubernetes these days, not the Kubernetes project itself.

It's hard to argue about costs without having any information about it. However, having one more job triggered on the changes to a tiny subset of codebase wouldn't make a lot of difference in my opinion. So, I'm still for keeping both jobs active.

Let's see what other reviewers say about it.

It's hard to argue about costs without having any information about it

Then just trust Ben when he says that automatically triggered pre-submit jobs shouldn't exist at all or only under very special circumstances. He is SIG testing and infra lead, he knows about the costs.

having one more job triggered on the changes to a tiny subset of codebase

I think you haven't seen all PRs where this got triggered. I got asked several times about them in the context of PRs where they shouldn't have been started.

It's not necessary to cover all possible runtime variants during pre-submit for all PRs. One of them should be sufficient, the rest can be covered during periodic testing. This cuts down CI costs and noise in unrelated PRs that happen to trigger the run_if_changed.

These pre-submit checks are more useful when someone actually modifies source code and less useful for other changes (Go dependencies, Markdown).

pohly · 2024-12-14T16:48:12Z

/assign @klueska

pohly · 2024-12-14T17:04:21Z

/priority important-soon

We keep running these extra jobs unnecessarily (kubernetes/kubernetes#129213, kubernetes/kubernetes#129195). kubernetes/kubernetes#129103, kubernetes/kubernetes#129103, etc.).

dims · 2024-12-14T17:33:49Z

I am gonna land this, we can iterate if needed. want to stop the bleeding!

/approve
/lgtm

k8s-ci-robot · 2024-12-14T17:34:22Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dims, pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~config/jobs/kubernetes/sig-node/OWNERS~~ [dims]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2024-12-14T17:46:46Z

@pohly: Updated the job-config configmap in namespace default at cluster test-infra-trusted using the following files:

key sig-node-presubmit.yaml using file config/jobs/kubernetes/sig-node/sig-node-presubmit.yaml

In response to this:

This reduces the number of jobs which get triggered automatically to save costs and avoid noise in unrelated PRs. See kubernetes/community#8196 for a suggested policy and rationale.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

node dra: don't run experimental jobs automatically

72c1f43

The -kubetest2 variants were added as part of kubernetes#33550 without considering whether it makes sense to run them automatically. They shouldn't because that doesn't add test coverage and is expensive.

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 13, 2024

k8s-ci-robot requested review from endocrimes and SergeyKanzhelev December 13, 2024 07:21

k8s-ci-robot added area/config Issues or PRs related to code in /config area/jobs sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Dec 13, 2024

pohly commented Dec 13, 2024

View reviewed changes

pohly mentioned this pull request Dec 13, 2024

Update x/crypto/ssh dependency to v0.31.0 kubernetes/kubernetes#129195

Merged

pohly added 2 commits December 13, 2024 14:50

node dra: run_if_changed only for Go source code

229f05c

These pre-submit checks are more useful when someone actually modifies source code and less useful for other changes (Go dependencies, Markdown).

pohly force-pushed the dra-job-consolidation branch from cb517c8 to 229f05c Compare December 13, 2024 13:51

k8s-ci-robot assigned klueska Dec 14, 2024

k8s-ci-robot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Dec 14, 2024

k8s-ci-robot assigned dims Dec 14, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 14, 2024

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 14, 2024

k8s-ci-robot merged commit e1d92e6 into kubernetes:master Dec 14, 2024
6 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DRA: consolidate pre-submit jobs #33942

DRA: consolidate pre-submit jobs #33942

pohly commented Dec 13, 2024

pohly Dec 13, 2024

bart0sh Dec 13, 2024

elieser1101 Dec 13, 2024

bart0sh Dec 13, 2024

elieser1101 Dec 13, 2024

pohly Dec 13, 2024

bart0sh Dec 13, 2024 •

edited

Loading

bart0sh Dec 13, 2024

bart0sh Dec 13, 2024

pohly Dec 13, 2024

pohly Dec 13, 2024

bart0sh Dec 13, 2024

pohly Dec 14, 2024

bart0sh Dec 14, 2024

pohly Dec 14, 2024

pohly commented Dec 14, 2024

pohly commented Dec 14, 2024

dims commented Dec 14, 2024

k8s-ci-robot commented Dec 14, 2024

k8s-ci-robot commented Dec 14, 2024

DRA: consolidate pre-submit jobs #33942

DRA: consolidate pre-submit jobs #33942

Conversation

pohly commented Dec 13, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bart0sh Dec 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pohly commented Dec 14, 2024

pohly commented Dec 14, 2024

dims commented Dec 14, 2024

k8s-ci-robot commented Dec 14, 2024

k8s-ci-robot commented Dec 14, 2024

bart0sh Dec 13, 2024 •

edited

Loading