Deploying your own VPA with leader election enabled in GKE conflicts with the GKE system component #7461

raywainman · 2024-11-04T21:17:51Z

Which component are you using?:

vertical-pod-autoscaler

What version of the component are you using?:

Component version: vertical-pod-autoscaler-1.2.0+

Leader election functionality was added in #6985 and is turned off by default

What k8s version are you using (kubectl version)?:

Any version 1.27+

What environment is this in?:

GKE

What did you expect to happen?:

The self-deployed VPA recommender and the GKE implementation of HPA to continue working.

What happened instead?:

Both the self-deployed VPA recommender and the GKE version use a lease called vpa-recommender in kube-system.

If you deploy your own VPA recommender, it might "steal" the lease and prevent the GKE implementation of HPA.

How to reproduce it (as minimally and precisely as possible):

Create a cluster
Deploy your own vpa-recommender (eg. using vpa-up.sh). Make sure leader election is enabled (leader-elect=true).
Do something to disrupt the control plane (for example upgrade the version).
See if your self deployed version of the vpa-recommender has grabbed the lease, if so HPA could stop working.

Anything else?

This is due to the unfortunate naming collision between GKE's system controller (also called vpa-recommender and the one provided here)

The text was updated successfully, but these errors were encountered:

raywainman · 2024-11-04T21:19:09Z

cc @adrianmoisey

Some thoughts on mitigating this:

Document the flags very carefully to encourage folks not to use the default when deploying on GKE.
Change the default here which could possibly break some users.

adrianmoisey · 2024-11-05T06:32:03Z

I see two paths forward to fixing this:

GKE stops using vpa-recommender as a lease name
VPA stops using vpa-recommender as a lease name

I can't speak for what that lease is being used for in GKE, but I can only assume that changing that lease is difficult or impossible in GKE.

Given that the lease(s) in VPA are only used for VPA components, and running multiple recommenders and updaters for a brief period isn't that worst thing in the world, my vote is that we change the default lease name in the VPA.

Any VPA configured with the lease enabled will only be running multiple pods for a short period of time, which should be fine.

It's obviously not an amazing path forward, but may be worth doing.

I'm curios what @voelzmo and @kwiesmueller think, as they may be the ones approving that controversial PR.

adrianmoisey · 2024-11-05T12:45:02Z

my vote is that we change the default lease name in the VPA.

If we do go this path, i suggest we also make PRs into 3rd party Helm charts to ensure they support the new default name. Some of them hardcode the lease name:

kwiesmueller · 2024-11-05T14:22:20Z

I'm not sure if changing the default in OSS is the right approach because GKE claims it too.
Having documentation (GKE should maybe have a docs page for that too) seems like it should be enough.
Is there anything preventing GKE users from deploying a custom VPA with a different lease name?

cc. @raywainman

adrianmoisey · 2024-11-05T14:28:10Z

I'm not sure if changing the default in OSS is the right approach because GKE claims it too.

My concern is that if someone follows the happy path of deployment, and enables the leases without reading documentation, then GKE breaks in ways that are not obvious.

It took us 4 days and countless messages with GCP support before we found what GCP was doing with the vpa-recommender lease (being that the GKE HPA uses it).
It's also not possible (from what I understand) for a user to self-troubleshoot this problem (ie: GKE users can't see the logs of the HPA controller).

The pain of renaming the default OSS lease is pretty small.

adrianmoisey · 2024-11-06T06:39:47Z

/area vertical-pod-autoscaler

raywainman · 2024-11-06T20:17:49Z

Just thinking out loud - what is the worst that can happen when changing a lease name?

On upgrade, we would have two vpa-recommender pods (one from old version, one from new version) that believes they are the leader.

For a short period of time, both of them will try to compute and apply VPA recommendations. Off the top of my head, these should all generally be the same recommendations, we will just have the possibility of racing/duplication.

Is there any other risk anyone else can think of?

adrianmoisey · 2024-11-06T20:44:05Z

Just thinking out loud - what is the worst that can happen when changing a lease name?

Yup! That is my thought too.

For a short period of time, both of them will try to compute and apply VPA recommendations. Off the top of my head, these should all generally be the same recommendations, we will just have the possibility of racing/duplication.

Correct! My understanding is that the bad thing that could happen is that 2 VPA recommenders should be giving roughly the same recommendation, for a very short period of time, after which only 1 will be active.

And, the VPA recommender's --recommender-interval flag is set to 1 minute by default, lowering that risk. There may be a few other parts I may be forgetting, such as the OOM detector, but I don't think that's a big deal either.

raywainman · 2024-11-06T20:48:59Z

@ialidzhikov what do you think?

My vote is to change it now before the next release (we could even consider a cherrypick here) given that:

This feature is fairly new, we have an opportunity to fix this before it gains wider adoption.
Leader election is off by default.
We can try and be loud in the release notes about this change to warn users in case they see strange behavior.
The theoretical impact of having two leaders running temporarily seems quite minimal.

raywainman · 2024-11-06T20:49:20Z

cc @voelzmo as well

raywainman · 2024-11-06T20:55:11Z

Can maybe something go wrong with the checkpoint?

If the new version leader writes a checkpoint that the old version leader doesn't understand perhaps?

EDIT: Looks like we have support for this case.

adrianmoisey · 2024-11-07T06:43:48Z

@ialidzhikov what do you think?

My vote is to change it now before the next release (we could even consider a cherrypick here) given that:

This feature is fairly new, we have an opportunity to fix this before it gains wider adoption.

Leader election is off by default.

We can try and be loud in the release notes about this change to warn users in case they see strange behavior.

The theoretical impact of having two leaders running temporarily seems quite minimal.

I agree with all these points.
Also to note, the VPA documentation isn't very good about telling people how to run the VPA, the upside being that no documentation was really updated explaining the lease feature (besides the update to the parameters: 211ecdc). I recently added #7463, but that's a few days old.

If we're going to change the lease name, I think now is the perfect time to do so.

ialidzhikov · 2024-11-07T07:45:55Z

Hi folks,

% docker run registry.k8s.io/autoscaling/vpa-recommender:1.2.1 --help

      --leader-elect                                         Start a leader election client and gain leadership before executing the main loop. Enable this when running replicated components for high availability.
      --leader-elect-lease-duration duration                 The duration that non-leader candidates will wait after observing a leadership renewal until attempting to acquire leadership of a led but unrenewed leader slot. This is effectively the maximum duration that a leader can be stopped before it is replaced by another candidate. This is only applicable if leader election is enabled. (default 15s)
      --leader-elect-renew-deadline duration                 The interval between attempts by the acting master to renew a leadership slot before it stops leading. This must be less than the lease duration. This is only applicable if leader election is enabled. (default 10s)
      --leader-elect-resource-lock string                    The type of resource object that is used for locking during leader election. Supported options are 'leases', 'endpointsleases' and 'configmapsleases'. (default "leases")
      --leader-elect-resource-name string                    The name of resource object that is used for locking during leader election. (default "vpa-recommender")
      --leader-elect-resource-namespace string               The namespace of resource object that is used for locking during leader election. (default "kube-system")
      --leader-elect-retry-period duration                   The duration the clients should wait between attempting acquisition and renewal of a leadership. This is only applicable if leader election is enabled. (default 2s)

The leader election resource name and namespace are configurable. If you are installing a custom VPA to a GKE cluster, please use the --leader-elect-resource-namespace and --leader-elect-resource-name flags to avoid using the same lease the GKE's VPA is using.

ialidzhikov · 2024-11-07T07:49:35Z

I am also open to rename the default lease name in the VPA repo to avoid this issue.

kwiesmueller · 2024-11-07T15:15:27Z

Alright, as it's still new and it's an easy way to avoid issues before too many people use it. I guess changing it now is fine.
I am a little worried about breaking people that already use it, so not sure about backporting it (doesn't seem correct under semantic versioning).

raywainman · 2024-11-14T16:02:59Z

We'll need to update this: https://github.com/cowboysysop/charts/blob/master/charts/vertical-pod-autoscaler/templates/recommender/clusterrole.yaml#L141-L142 which was added in cowboysysop/charts#705.

If we combine it with a release then this should be less painful.

raywainman · 2024-11-14T16:16:45Z

Opened cowboysysop/charts#781 to update the Helm chart.

…se" to avoid conflicts See kubernetes/autoscaler#7461. The default lock name can sometimes conflict with a managed deployment of VPA/HPA which can cause issues on your cluster. Here we are proposing changing the flag value in the Helm chart to a new name (`vpa-recommender-lease`). We are keeping the old resource name in the RBAC to avoid disruptions or issues during upgrade.

adrianmoisey · 2024-12-03T08:47:57Z

A summery of where we are with this:

The lease name default has been changed - Change default lease name for vpa-recommender to "vpa-recommender-lease" #7498 ✅
A PR for cowboysysop has been made, but not yet merged - Change default lease name for vpa-recommender to "vpa-recommender-lease" to avoid conflicts cowboysysop/charts#781 ⚠️
A PR for stevehipwell has been made, but not yet merged - Change default lease name for vpa-recommender to "vpa-recommender-lease" to avoid conflicts stevehipwell/helm-charts#1090 ⚠️

We should try get the bottom two items done before cutting a release of the VPA

voelzmo · 2024-12-03T09:06:02Z

We should try get the bottom two items done before cutting a release of the VPA

I don't think we need to. Helm charts decide when/how to pick up new upstream releases on their own, right? Is there any reason why a vpa release does depend on changes made in helm charts?

adrianmoisey · 2024-12-03T09:11:41Z

We should try get the bottom two items done before cutting a release of the VPA

I don't think we need to. Helm charts decide when/how to pick up new upstream releases on their own, right? Is there any reason why a vpa release does depend on changes made in helm charts?

There is certainly no hard dependency, but I worry that if someone decides to bump the Helm chart before merging in the lease rename PRs, then users of that Helm chart may have a broken VPA.

I don't think we have to wait, but I think it would be a nice thing to do (with some friendly nudges to the Helm chart maintainers if PRs aren't merged).

Basically, we're breaking a promise we had with them, let's try handle the name change as gracefully as possible.

stevehipwell · 2024-12-04T10:06:41Z

@adrianmoisey speaking as the maintainer of the stevehipwell Helm chart; the charts shouldn't be modified until the release is out, as long as the CHANGELOG documents the change to lease (and maybe a warning about requiring both for the update) it will be captured.

On to the wider issue here, is there a reason why the lease name wasn't made an optional arg? This would have the least impact on all of the existing VPA users, most of whom are unlikely to be on GCP? And also is there a reason why if the change was being made that it wasn't made consistently across all the components? Nothing makes long term maintenance harder than unexpected inconsistency.

adrianmoisey · 2024-12-04T10:09:24Z

speaking as the maintainer of the stevehipwell Helm chart; the charts shouldn't be modified until the release is out, as long as the CHANGELOG documents the change to lease (and maybe a warning about requiring both for the update) it will be captured.

Good to know, thanks!

On to the wider issue here, is there a reason why the lease name wasn't made an optional arg?

The lease functionality was defaulted to "off", and the lease name was an arg, with a default set.

And also is there a reason why if the change was being made that it wasn't made consistently across all the components?

What do you mean? The lease was rolled out for all 3 VPA components.

stevehipwell · 2024-12-04T10:24:40Z

@adrianmoisey sorry I missed the original state of the system regarding existing args, but that leads me to question why a code change was required at all? I can see why support for using an alternate lease name would be useful, but it looks to me like this could be a manifest only change? And I'm not sure I understand why the default needs to change rather than document the edge case? My Helm chart can (and will) easily support modifying the default lease names.

What do you mean? The lease was rolled out for all 3 VPA components.

I can only see a change to the default name for the recommender in #7498, are there other PRs for the other components to add the -lease suffix?

adrianmoisey · 2024-12-04T11:14:22Z

sorry I missed the original state of the system regarding existing args, but that leads me to question why a code change was required at all? I can see why support for using an alternate lease name would be useful, but it looks to me like this could be a manifest only change? And I'm not sure I understand why the default needs to change rather than document the edge case? My Helm chart can (and will) easily support modifying the default lease names.

There were a few factors into the decision:

When using this lease name on GKE, the side-effects are very surprising and not easily to debug or troubleshoot
The lease is currently off by default, and a few new feature, we assume not many people will be using it yet
Changing the default doesn't cause harm for the recommender. Worst case is that if multiple recommenders were run, there's a small possibility that you get subtly different recommendations. We believe that this won't happen during a deployment rollout

We basically concluded that the pain of changing the lease name now was lower than the pain of someone having troubles when running on GKE.

I can only see a change to the default name for the recommender in #7498, are there other PRs for the other components to add the -lease suffix?

Those names don't conflict with GKE, so there's no need to change them.

stevehipwell · 2024-12-04T12:58:37Z

I can see that it's easier, but is it correct? This looks like a GKE issue and what's to stop them from modifying their deployment to change their lease name if they're tracking the changes in this repo blindly? For managed components they should, at a minimum have a unique lease defined for this very reason.

Those names don't conflict with GKE, so there's no need to change them.

I think this is a mistake if you're making a change, skipping changing the others lowers the complexity today (one off write cost) but increases it for the life of the project (perpetual read cost). I would suggest that the lease names should be aligned across all components.

Anway, and please don't take my feedback the wrong way, I'll make sure to update the Helm chart when the next VPA release is out. I'll likely make the lease name configurable so even if GKE takes the new name there will be a way to work around it.

raywainman added the kind/bug Categorizes issue or PR as related to a bug. label Nov 4, 2024

raywainman mentioned this issue Nov 4, 2024

Small documentation changes to warn users about deploying VPA with leader election on GKE. #7462

Merged

adrianmoisey mentioned this issue Nov 5, 2024

Clarify how to run VPA with multiple Pods #7463

Merged

k8s-ci-robot added the area/vertical-pod-autoscaler label Nov 6, 2024

This was referenced Nov 14, 2024

feat(vertical-pod-autoscaler): update registry.k8s.io/autoscaling/vpa to v1.2.1 cowboysysop/charts#705

Merged

Change default lease name for vpa-recommender to "vpa-recommender-lease" to avoid conflicts cowboysysop/charts#781

Open

raywainman mentioned this issue Nov 14, 2024

Change default lease name for vpa-recommender to "vpa-recommender-lease" #7498

Merged

adrianmoisey mentioned this issue Dec 3, 2024

Change default lease name for vpa-recommender to "vpa-recommender-lease" to avoid conflicts stevehipwell/helm-charts#1090

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deploying your own VPA with leader election enabled in GKE conflicts with the GKE system component #7461

Deploying your own VPA with leader election enabled in GKE conflicts with the GKE system component #7461

raywainman commented Nov 4, 2024 •

edited

Loading

raywainman commented Nov 4, 2024

adrianmoisey commented Nov 5, 2024

adrianmoisey commented Nov 5, 2024 •

edited

Loading

kwiesmueller commented Nov 5, 2024

adrianmoisey commented Nov 5, 2024

adrianmoisey commented Nov 6, 2024

raywainman commented Nov 6, 2024

adrianmoisey commented Nov 6, 2024

raywainman commented Nov 6, 2024

raywainman commented Nov 6, 2024

raywainman commented Nov 6, 2024 •

edited

Loading

adrianmoisey commented Nov 7, 2024

ialidzhikov commented Nov 7, 2024 •

edited

Loading

ialidzhikov commented Nov 7, 2024

kwiesmueller commented Nov 7, 2024

raywainman commented Nov 14, 2024

raywainman commented Nov 14, 2024

adrianmoisey commented Dec 3, 2024

voelzmo commented Dec 3, 2024

adrianmoisey commented Dec 3, 2024

stevehipwell commented Dec 4, 2024

adrianmoisey commented Dec 4, 2024

stevehipwell commented Dec 4, 2024

adrianmoisey commented Dec 4, 2024

stevehipwell commented Dec 4, 2024

Deploying your own VPA with leader election enabled in GKE conflicts with the GKE system component #7461

Deploying your own VPA with leader election enabled in GKE conflicts with the GKE system component #7461

Comments

raywainman commented Nov 4, 2024 • edited Loading

raywainman commented Nov 4, 2024

adrianmoisey commented Nov 5, 2024

adrianmoisey commented Nov 5, 2024 • edited Loading

kwiesmueller commented Nov 5, 2024

adrianmoisey commented Nov 5, 2024

adrianmoisey commented Nov 6, 2024

raywainman commented Nov 6, 2024

adrianmoisey commented Nov 6, 2024

raywainman commented Nov 6, 2024

raywainman commented Nov 6, 2024

raywainman commented Nov 6, 2024 • edited Loading

adrianmoisey commented Nov 7, 2024

ialidzhikov commented Nov 7, 2024 • edited Loading

ialidzhikov commented Nov 7, 2024

kwiesmueller commented Nov 7, 2024

raywainman commented Nov 14, 2024

raywainman commented Nov 14, 2024

adrianmoisey commented Dec 3, 2024

voelzmo commented Dec 3, 2024

adrianmoisey commented Dec 3, 2024

stevehipwell commented Dec 4, 2024

adrianmoisey commented Dec 4, 2024

stevehipwell commented Dec 4, 2024

adrianmoisey commented Dec 4, 2024

stevehipwell commented Dec 4, 2024

raywainman commented Nov 4, 2024 •

edited

Loading

adrianmoisey commented Nov 5, 2024 •

edited

Loading

raywainman commented Nov 6, 2024 •

edited

Loading

ialidzhikov commented Nov 7, 2024 •

edited

Loading