vpa-recommender: Make max allowed recommendation configurable #7147

ialidzhikov · 2024-08-08T11:24:16Z

Which component are you using?:

vertical-pod-autoscaler

Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:

Currently, it is possible to configure max allowed recommendation per VPA resource using the field .spec.resourcePolicy.containerPolicies[].maxAllowed. Setting this field is useful to make sure that the recommendations won't exceed largest Node's allocatable. Otherwise, due to the high recommendation the Pod will be unschedulable and this will be a downtime for the corresponding Pod.

It seems that GKE's VPA already offers such a feature on top of the open-source VPA:

GKE vertical Pod autoscaling provides the following benefits over the Kubernetes open source autoscaler:

Takes maximum node size and resource quotas into account when determining the recommendation target.

We would like to make vpa-recommender aware of the maximum node size and prevent from providing recommendations that would make the corresponding Pods unschedulable.

Describe the solution you'd like.:

Does it make sense to provide flags in vpa-recommender about the max recommendation it can make for a Pod? I see that there are already similar flags for min allowed cpu and memory:

autoscaler/vertical-pod-autoscaler/pkg/recommender/logic/recommender.go

Lines 29 to 30 in 5aaeb83

    
           podMinCPUMillicores        = flag.Float64("pod-recommendation-min-cpu-millicores", 25, `Minimum CPU recommendation for a pod`) 
        
           podMinMemoryMb             = flag.Float64("pod-recommendation-min-memory-mb", 250, `Minimum memory recommendation for a pod`)

.

PS: However, for min allowed recommendation the flags seem to be per Pod and the recommendations are distributed proportionally based on the Pod's containers count. See

autoscaler/vertical-pod-autoscaler/pkg/recommender/logic/recommender.go

Lines 70 to 74 in 5aaeb83

    
           fraction := 1.0 / float64(len(containerNameToAggregateStateMap)) 
        
           minResources := model.Resources{ 
        
           	model.ResourceCPU:    model.ScaleResource(model.CPUAmountFromCores(*podMinCPUMillicores*0.001), fraction), 
        
           	model.ResourceMemory: model.ScaleResource(model.MemoryAmountFromBytes(*podMinMemoryMb*1024*1024), fraction), 
        
           }

I am not sure if the same thing would work well for Pods that have multiple containers under VPA. For example, for a Pod with 2 containers, one of the containers could be limited at 50% of the configured max allowed value.

Describe any alternative solutions you've considered.:

Other folks may share the way they are tackling this issue.

One alternative we are trying to avoid is to have a mutating webhook on VPA that sets maxAllowed per VPA resource.

Additional context.:

This topic was also brought up in the community meeting on 2024-06-17.

The text was updated successfully, but these errors were encountered:

ialidzhikov · 2024-08-08T11:25:14Z

/area vertical-pod-autoscaler

raywainman · 2024-08-12T19:22:59Z

Thanks for submitting this!

This makes sense and is nice and simple, as you mention, it's very analogous to the min flags.

One possible alternative that we didn't talk about during the SIG meeting was to use the ContainerResourcePolicy.MaxAllowed field to cap the recommendations. Though the downside to using this is that you would need every single VPA in the cluster to have this field set.

Regarding this proposal, as we talked about, the downside is that you need to manually set this flag, this value could change over the lifetime of your cluster as different sized nodes are adopted.

If you were to set this "max" value dynamically you would want to make this adjustment in:

the updater to make sure it can only evict a pod if there is enough room in the cluster
the admission-controller to trim down and cap recommendations on new or restarted pods to make sure they can fit in what is available in the cluster

However, doing this statically cluster wide is a great first step and seems totally reasonable to me!

vlerenc · 2024-09-25T11:41:23Z

That would be very helpful. Usually, the configuration of the cluster and its VPA installation will be done by the same (human or technical) operators. It would be more difficult to change all VPA resources, especially if they are controlled by third-party (technical) operators that we cannot influence in all aspects. Then, the alternative, if this isn't becoming a global VPA option, is to create a mutating web hook, but then everybody would have to do that. If VPA would offer that feature out-of-the-box, this would be great. After all, maxAllowed seems to have predominantly this use case: to make the recommender not recommend anything that wouldn't fit. The definition of the cluster and its worker pools is usually always handled by other people than the developers of the workload and their VPA resources - at least that's a pattern we see. As for dynamically following the adjustment/larger/largest nodes, I am doubtful too much intelligence will pay off. In our case it wouldn't as we a.) use worker pool node scale-from-zero, i.e. VPA cannot see the possible available machine sizes at all if it just checks the nodes and b.) it is challenging to understand which nodes are really suitable (taints/tolerations, affinities/anti-affinities, etc.). The last point could be overcome, the first one never. Therefore static configuration for VPA's global maxAllowed sounds simple and reasonable - more intelligence may help, but isn't guaranteed to always help.

omerap12 · 2024-10-04T13:28:23Z

Here’s how I see the options for supporting this:

Iterate through the cluster nodes and find the maximum allocatable resources per node. The downside is that VPA won’t be able to account for "upcoming nodes" (nodes that are expected to join the cluster via the cluster autoscaler). I've started working on this approach here: [WIP] VPA: Takes maximum node size into account (upper-bound) #7345.
Iterate through both the current cluster nodes and the "upcoming nodes."
Implement a mutating webhook on VPA that sets a maxAllowed value per VPA resource (as suggested by the issuer).
Use a fixed value, similar to how pod-recommendation-min-cpu-millicores and pod-recommendation-min-memory-mb are handled.

cc @adrianmoisey

adrianmoisey · 2024-10-06T18:52:42Z

Related issue: #6487

ialidzhikov · 2024-11-26T14:58:52Z

After spending some time, it is not straightforward to add Pod-level flags for max allowed cpu and memory recommendations.

I already raised the concern in the initial issue description:

PS: However, for min allowed recommendation the flags seem to be per Pod and the recommendations are distributed proportionally based on the Pod's containers count. See

autoscaler/vertical-pod-autoscaler/pkg/recommender/logic/recommender.go

Lines 70 to 74 in 5aaeb83

fraction := 1.0 / float64(len(containerNameToAggregateStateMap))

minResources := model.Resources{

model.ResourceCPU: model.ScaleResource(model.CPUAmountFromCores(*podMinCPUMillicores*0.001), fraction),

model.ResourceMemory: model.ScaleResource(model.MemoryAmountFromBytes(*podMinMemoryMb*1024*1024), fraction),

}

I am not sure if the same thing would work well for Pods that have multiple containers under VPA. For example, for a Pod with 2 containers, one of the containers could be limited at 50% of the configured max allowed value.

As end users of VPA, it would be great if end users can specify Pod-level maximum allowed values because scheduling is based on Pod resource requests. In this way, end user can make sure that VPA won't recommended values that make Pods unschedulable.
On the other side, recommendations in VPA are on container-level. VPA can be enabled for > 1 container in a Pod. The question is how VPA is expected to distribute the Pod-level max allowed recommendation across the many containers in the Pod. Some approaches are - distribute the configured max allowed recommendation values evenly; distribute proportionally (proportionally based on what?). My conclusion is that VPA should not do that as it does not know about the workload.

Hence, I would propose to introduce container-level max allowed recommendation flags. Yes, with this approach end user can theoretically end up again with unschedulable Pods due to VPA recommendations above the Node's allocatable.

Example: Cluster Node's allocatable is 7Gi, while a Pod has 2 containers and both of the containers have a recommendation of 4Gi each. The Pod's memory request will be 8Gi and it will exceed the Node's allocatable of 7Gi.

However, I think the container-level max allowed proposal is good enough because:

it is not a common case autoscaling to be enabled for many containers in a Pod. Even it is, usually there is a one main container while the others are not likely to need upscaling like the main one
it is not a perfect solution but it is a practical one that will address the majority of the cases caused by the well-known VPA limitation
even with the VPA's maxAllowed field it is not possible containers of a Pod to grow up to a max value but at the same time to ensure that the Pod remain schedulable.

That's my proposal is to add flags on container-level:

--container-recommendation-max-memory-mb
--container-recommendation-max-cpu-millicores

I would even propose to deprecate the existing Pod-level flags for min allowed recommendations in favor of container-level equivalents.

autoscaler/vertical-pod-autoscaler/pkg/recommender/logic/recommender.go

Lines 29 to 30 in ab6d986

    
           podMinCPUMillicores        = flag.Float64("pod-recommendation-min-cpu-millicores", 25, `Minimum CPU recommendation for a pod`) 
        
           podMinMemoryMb             = flag.Float64("pod-recommendation-min-memory-mb", 250, `Minimum memory recommendation for a pod`)

Let me know what you think.

adrianmoisey · 2024-11-27T10:56:17Z

I like the proposal, but I worry about this:

Example: Cluster Node's allocatable is 7Gi, while a Pod has 2 containers and both of the containers have a recommendation of 4Gi each. The Pod's memory request will be 8Gi and it will exceed the Node's allocatable of 7Gi.

I guess if one were to run a recommender for various Pods it could work.
For example:

recommender-1 - VPA recommender for Pods with single containers
recommender 2- VPA recommender for Pods with 2 containers

I am wondering if the Pod level resources KEP could actually help out here.
I haven't read the final proposal thoroughly yet, but the VPA could look at a combination of all containers metrics, and set a Pod level request. And in this case, it could keep the Pod's recommendation within the limits of an option, such as --pod-recommendation-max-memory-mb

omerap12 · 2024-12-03T18:27:35Z

I like the proposal, but I worry about this:

Example: Cluster Node's allocatable is 7Gi, while a Pod has 2 containers and both of the containers have a recommendation of 4Gi each. The Pod's memory request will be 8Gi and it will exceed the Node's allocatable of 7Gi.

I guess if one were to run a recommender for various Pods it could work. For example:

recommender-1 - VPA recommender for Pods with single containers

recommender 2- VPA recommender for Pods with 2 containers

I am wondering if the Pod level resources KEP could actually help out here. I haven't read the final proposal thoroughly yet, but the VPA could look at a combination of all containers metrics, and set a Pod level request. And in this case, it could keep the Pod's recommendation within the limits of an option, such as --pod-recommendation-max-memory-mb

From my understanding, pod-level resource limits are optional, so we can't depend on them being enforced. However, we should account for them to ensure the total resource usage of all containers stays within the pod's defined limits.

Here's an approach we could consider—let me know your thoughts:

Assume a pod has a memory limit of 1 Gi, for example, and contains 3 containers.

The recommender sums up the recommended memory for each container.
If the total recommended memory is less than or equal to the pod's memory limit, no adjustment is needed.
If the total exceeds the pod's memory limit, we normalize the recommendations. Each container would get a proportion of the pod's memory limit based on its recommendation:

Adjusted Memory for Container i = $\left( \frac{x_i}{x_1 + x_2 + x_3} \right) \times \text{Pod Memory Limit}$

Here, $x_i$ represents the memory recommendation for container i.

ialidzhikov · 2024-12-04T13:50:02Z

@raywainman @adrianmoisey , I created #7560.

I will be happy to receive a feedback! :)

ialidzhikov added the kind/feature Categorizes issue or PR as related to a new feature. label Aug 8, 2024

k8s-ci-robot added the area/vertical-pod-autoscaler label Aug 8, 2024

ialidzhikov mentioned this issue Aug 28, 2024

Make VPA maxAllowed configurable in the Seed and Garden resource specs gardener/gardener#10413

Closed

ialidzhikov linked a pull request Dec 4, 2024 that will close this issue

vpa-recommender: Add support for configuring global max allowed resources #7560

Open

omerap12 mentioned this issue Dec 5, 2024

VPA support for pod-level resource specifications #7571

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vpa-recommender: Make max allowed recommendation configurable #7147

vpa-recommender: Make max allowed recommendation configurable #7147

ialidzhikov commented Aug 8, 2024 •

edited

Loading

ialidzhikov commented Aug 8, 2024

raywainman commented Aug 12, 2024 •

edited

Loading

vlerenc commented Sep 25, 2024

omerap12 commented Oct 4, 2024

adrianmoisey commented Oct 6, 2024

ialidzhikov commented Nov 26, 2024 •

edited

Loading

adrianmoisey commented Nov 27, 2024

omerap12 commented Dec 3, 2024

ialidzhikov commented Dec 4, 2024

vpa-recommender: Make max allowed recommendation configurable #7147

vpa-recommender: Make max allowed recommendation configurable #7147

Comments

ialidzhikov commented Aug 8, 2024 • edited Loading

ialidzhikov commented Aug 8, 2024

raywainman commented Aug 12, 2024 • edited Loading

vlerenc commented Sep 25, 2024

omerap12 commented Oct 4, 2024

adrianmoisey commented Oct 6, 2024

ialidzhikov commented Nov 26, 2024 • edited Loading

adrianmoisey commented Nov 27, 2024

omerap12 commented Dec 3, 2024

ialidzhikov commented Dec 4, 2024

ialidzhikov commented Aug 8, 2024 •

edited

Loading

raywainman commented Aug 12, 2024 •

edited

Loading

ialidzhikov commented Nov 26, 2024 •

edited

Loading