vpa-recommender: Add support for configuring global max allowed resources #7560

ialidzhikov · 2024-12-04T13:48:22Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

See #7147

Which issue(s) this PR fixes:

Fixes #7147

Special notes for your reviewer:

N/A

Does this PR introduce a user-facing change?

vpa-recommender does now support two new flags - `--container-recommendation-max-allowed-cpu` and `--container-recommendation-max-allowed-memory`. The flags make possible to configure the global max allowed resources the vpa-recommender can recommend for a container. The flags aim to address a known limitation of VPA (vpa-recommender, in particular) that it is not aware of the cluster's capacity and can make pod unschedulable after applying a recommendation.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot · 2024-12-04T13:48:29Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ialidzhikov
Once this PR has been reviewed and has the lgtm label, please assign kwiesmueller for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

vertical-pod-autoscaler/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

adrianmoisey · 2024-12-04T14:18:05Z

Sorry, but I've just gotten a PR merged (#7548) that will merge conflict with yours.
It's only documentation related.

vertical-pod-autoscaler/pkg/recommender/main.go

…14e06e613e531f7dd This PR adopts the changes from kubernetes/autoscaler#7560.

vertical-pod-autoscaler/pkg/utils/vpa/capping.go

adrianmoisey · 2024-12-08T12:19:03Z

vertical-pod-autoscaler/pkg/recommender/main.go

+	var globalMaxAllowed apiv1.ResourceList
+	if !maxAllowedCPU.Quantity.IsZero() {
+		setGlobalMaxAllowed(&globalMaxAllowed, apiv1.ResourceCPU, maxAllowedCPU.Quantity)
+	}
+	if !maxAllowedMemory.Quantity.IsZero() {
+		setGlobalMaxAllowed(&globalMaxAllowed, apiv1.ResourceMemory, maxAllowedMemory.Quantity)


Should there be some logic to ensure that the global max is greater than the global min?
It may be a bit strange since one is per container and the other per pod

In a perfect world, I agree that validation should exists.
However, I am not sure how to add such validation when the global min allowed flags are on Pod-level and global max allowed flags are on container-level.

In #7147 (comment) I suggested to deprecate the global Pod-level min allowed flags because:

they are on Pod-level, and not on container-level. The recommendation is on container-level. Right now, the recommender splits the Pod-level min allowed values to the number of containers in a Pod.

they are implemented as ResourceEstimator. This causes the VPA .status.uncappedTarget to be capped as well which is a contradiction of the definition of this field. This also causes the global min-allowed flags to overwrite the VPA minAllowed field (if specified), there is also no merge between the global Pod-level min allowed flags and the VPA's minAllowed field.

If you agree, I can open a dedicated issue for deprecated the global Pod-level min allowed flags and introduce new container-level min allowed equivalents. And validation can be added between the global container-level min and max allowed flags. WDYT?

However, I am not sure how to add such validation when the global min allowed flags are on Pod-level and global max allowed flags are on container-level.

Yeah, I agree. I can't figure out a way to make the validation work.

In #7147 (comment) I suggested to deprecate the global Pod-level min allowed flags because:

I don't think deprecation is necessary yet. Pod level resources may be in the VPA's future, so those flags may be used for Pod level resources.

adrianmoisey · 2024-12-08T12:23:41Z

vertical-pod-autoscaler/docs/examples.md

@@ -108,3 +109,16 @@ These options cannot be used together and are mutually exclusive.
 It is possible to set the failurePolicy of the webhook to `Fail` by passing `--webhook-failure-policy-fail=true` to the VPA admission controller.
 Please use this option with caution as it may be possible to break Pod creation if there is a failure with the VPA.
 Using it in conjunction with `--ignored-vpa-object-namespaces=kube-system` or `--vpa-object-namespace` to reduce risk.
+
+### Specifying global maximum allowed resources to prevent pods from being unschedulable


What do you think about moving this section move to the "features.md" page?

There are many sections in examples.md which actually describe features of VPA (Starting multiple recommenders, Custom memory bump-up after OOMKill, Using CPU management with static policy, Controlling eviction behavior based on scaling direction and resource, etc.). I don't think the newly introduced section is different from the existing section.
IMO, examples and features are overlapping conceptually. If you describe a feature, you usually also add example(s) of how the feature can be used. examples.md and features.md should be merged IMO. This is out-of-scope of the existing PR.

IMO, examples and features are overlapping conceptually. If you describe a feature, you usually also add example(s) of how the feature can be used. examples.md and features.md should be merged IMO. This is out-of-scope of the existing PR.

I agree with you here. The features page is new, and I want to start moving the examples across to the features page, with more description about that feature.

I thought it would be nice for the "global max" feature to be added to features from the start, since it's a better fit there, and will require work to move at a later stage.

(but it's up to you though, the documentation needs a lot of work in general)

I would prefer to keep it consistent with the existing doc. In another PR all the examples doc can be reworked as features sections.

adrianmoisey · 2024-12-08T12:25:35Z

Unsure if you want to do this, but an e2e test for the feature would be great!
If it's not something you want do, I'm happy to try figure it out in a different PR

…rces

…x-allowed-{cpu,memory}`

ialidzhikov · 2024-12-13T14:15:42Z

Unsure if you want to do this, but an e2e test for the feature would be great!

Do you have a suggestion for the e2e test? It seems that we deploy vpa components to a kind cluster and then create/update/delete VPA objects and finally assert their state. It is relatively easy to test a field in the VPA spec. But I don't find example of e2e tests for such global flags/options.
The only option I see is to deploy VPA initially with the global max allowed flags. I am not sure if this is desired or not. It would also affect other recommender tests. Or do you suggest to patch the vpa-recommender deployment as part of the e2e test?

ialidzhikov · 2024-12-13T14:16:51Z

ERROR: (gcloud.compute.networks.create) Could not fetch resource:
 - <!DOCTYPE html>
<html lang=en>
  <meta charset=utf-8>
  <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
  <title>Error 502 (Server Error)!!1</title>
  <style>
    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
  </style>
  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>
  <p><b>502.</b> <ins>That’s an error.</ins>
  <p>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds.  <ins>That’s all we know.</ins>

/test pull-kubernetes-e2e-autoscaling-vpa-full

adrianmoisey · 2024-12-15T12:53:45Z

Unsure if you want to do this, but an e2e test for the feature would be great!

Do you have a suggestion for the e2e test? It seems that we deploy vpa components to a kind cluster and then create/update/delete VPA objects and finally assert their state. It is relatively easy to test a field in the VPA spec. But I don't find example of e2e tests for such global flags/options. The only option I see is to deploy VPA initially with the global max allowed flags. I am not sure if this is desired or not. It would also affect other recommender tests. Or do you suggest to patch the vpa-recommender deployment as part of the e2e test?

It seems that at the moment we don't have a similar test to base yours off, so I think one would need to be written to deploy the recommender with a global max, and ensure that it doesn't go above that max.

It's going to be annoying to write such a test, I think, but I think the test will help us in the long run.

What do you think?

adrianmoisey · 2024-12-15T12:54:33Z

I'm going to give this a
/lgtm

I like the idea of an e2e test, but will let an approver decide if it's needed or not

raywainman

Thanks for this!! Sorry for the delay!

vertical-pod-autoscaler/pkg/recommender/main.go

vertical-pod-autoscaler/pkg/utils/vpa/capping.go

raywainman · 2024-12-16T18:44:17Z

vertical-pod-autoscaler/pkg/utils/vpa/capping.go

+				} else {
+					// Set resources from the global maxAllowed if the VPA maxAllowed is missing them.
+					for resourceName, quantity := range globalMaxAllowed {
+						if _, ok := maxAllowed[resourceName]; !ok {


Let's add a comment here that we only override this if the user did not explicitly set a maximum in their container policy in the VPA.

There is already

autoscaler/vertical-pod-autoscaler/pkg/utils/vpa/capping.go

Line 210 in 04e3340

// Set resources from the global maxAllowed if the VPA maxAllowed is missing them.

which is more or less stating the same thing.
Let me know if you have suggestions for improvements.

vertical-pod-autoscaler/pkg/utils/vpa/capping_test.go

k8s-ci-robot · 2024-12-18T09:40:18Z

New changes are detected. LGTM label has been removed.

ialidzhikov · 2024-12-18T09:42:48Z

Hi @raywainman ,

I addressed your review feedback in caa47f9.

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Dec 4, 2024

k8s-ci-robot requested a review from jbartosik December 4, 2024 13:48

k8s-ci-robot added the area/vertical-pod-autoscaler label Dec 4, 2024

k8s-ci-robot requested a review from raywainman December 4, 2024 13:48

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 4, 2024

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Dec 4, 2024

ialidzhikov mentioned this pull request Dec 4, 2024

vpa-recommender: Make max allowed recommendation configurable #7147

Open

ialidzhikov force-pushed the enh/global-max-allowed-flags-post-processor branch 2 times, most recently from 9967539 to 0a79197 Compare December 4, 2024 14:02

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 4, 2024

ialidzhikov force-pushed the enh/global-max-allowed-flags-post-processor branch from 0a79197 to af3a158 Compare December 4, 2024 14:29

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 4, 2024

adrianmoisey reviewed Dec 4, 2024

View reviewed changes

vertical-pod-autoscaler/pkg/recommender/main.go Outdated Show resolved Hide resolved

ialidzhikov added a commit to ialidzhikov/gardener that referenced this pull request Dec 4, 2024

[drop me] Update VPA components to v1.3.0-dev-af3a158def012d5aab41e61…

b620ce2

…14e06e613e531f7dd This PR adopts the changes from kubernetes/autoscaler#7560.

ialidzhikov requested a review from adrianmoisey December 6, 2024 08:06

adrianmoisey reviewed Dec 8, 2024

View reviewed changes

vertical-pod-autoscaler/pkg/utils/vpa/capping.go Outdated Show resolved Hide resolved

adrianmoisey reviewed Dec 8, 2024

View reviewed changes

ialidzhikov added 3 commits December 13, 2024 14:33

vpa-recommender: Add support for configuring global max allowed resou…

a4ec949

…rces

Rename --max-allowed-{cpu,memory} to `--container-recommendation-ma…

dd5add6

…x-allowed-{cpu,memory}`

Address review comment from adrianmoisey

04e3340

ialidzhikov force-pushed the enh/global-max-allowed-flags-post-processor branch from 12f42ee to 04e3340 Compare December 13, 2024 12:33

ialidzhikov requested a review from adrianmoisey December 13, 2024 14:17

k8s-ci-robot assigned adrianmoisey Dec 15, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 15, 2024

raywainman reviewed Dec 16, 2024

View reviewed changes

Address review feedback from raywainman

caa47f9

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 18, 2024

ialidzhikov requested a review from raywainman December 18, 2024 14:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vpa-recommender: Add support for configuring global max allowed resources #7560

vpa-recommender: Add support for configuring global max allowed resources #7560

ialidzhikov commented Dec 4, 2024 •

edited

Loading

k8s-ci-robot commented Dec 4, 2024

adrianmoisey commented Dec 4, 2024

adrianmoisey Dec 8, 2024

ialidzhikov Dec 13, 2024 •

edited

Loading

adrianmoisey Dec 13, 2024

adrianmoisey Dec 8, 2024

ialidzhikov Dec 13, 2024

adrianmoisey Dec 13, 2024

adrianmoisey Dec 13, 2024

ialidzhikov Dec 13, 2024

adrianmoisey commented Dec 8, 2024

ialidzhikov commented Dec 13, 2024 •

edited

Loading

ialidzhikov commented Dec 13, 2024

adrianmoisey commented Dec 15, 2024

adrianmoisey commented Dec 15, 2024

raywainman left a comment •

edited

Loading

raywainman Dec 16, 2024

ialidzhikov Dec 18, 2024

k8s-ci-robot commented Dec 18, 2024

ialidzhikov commented Dec 18, 2024

vpa-recommender: Add support for configuring global max allowed resources #7560

Are you sure you want to change the base?

vpa-recommender: Add support for configuring global max allowed resources #7560

Conversation

ialidzhikov commented Dec 4, 2024 • edited Loading

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot commented Dec 4, 2024

adrianmoisey commented Dec 4, 2024

Choose a reason for hiding this comment

ialidzhikov Dec 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adrianmoisey commented Dec 8, 2024

ialidzhikov commented Dec 13, 2024 • edited Loading

ialidzhikov commented Dec 13, 2024

adrianmoisey commented Dec 15, 2024

adrianmoisey commented Dec 15, 2024

raywainman left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-ci-robot commented Dec 18, 2024

ialidzhikov commented Dec 18, 2024

ialidzhikov commented Dec 4, 2024 •

edited

Loading

ialidzhikov Dec 13, 2024 •

edited

Loading

ialidzhikov commented Dec 13, 2024 •

edited

Loading

raywainman left a comment •

edited

Loading