Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a multi-dimensional pod autoscaler #7550

Open
wants to merge 39 commits into
base: master
Choose a base branch
from

Conversation

davidspek
Copy link
Contributor

Revival of: #5999

What type of PR is this?

/kind feature

What this PR does / why we need it:

Multidimensional pod autoscaling: https://github.com/kubernetes/autoscaler/blob/master/multidimensional-pod-autoscaler/AEP.md

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [AEP]: https://github.com/kubernetes/autoscaler/blob/master/multidimensional-pod-autoscaler/AEP.md

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 2, 2024
@k8s-ci-robot
Copy link
Contributor

Welcome @davidspek!

It looks like this is your first PR to kubernetes/autoscaler 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/autoscaler has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Dec 2, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @davidspek. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: davidspek
Once this PR has been reviewed and has the lgtm label, please assign maciekpytel for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Dec 2, 2024
@davidspek
Copy link
Contributor Author

/cc @jbartosik since you reviewed the original PR.

@davidspek davidspek force-pushed the mpa branch 5 times, most recently from 891012a to 7321b16 Compare December 2, 2024 12:05
@davidspek
Copy link
Contributor Author

@adrianmoisey Do you think you could have a look at this?

@adrianmoisey
Copy link
Member

@adrianmoisey Do you think you could have a look at this?

I'm happy to take a look, but I can't approve it.

I do have 2 initial concerns:

  1. Who will maintain this long term? At the moment the VPA is very under-maintained, and getting changes is is very difficult. Adding a MPA will only add to the already suffering project
  2. Overlap with the VPA. This project seems very duplicative of the VPA. Having to keep two similar projects in sync may be difficult.

I don't speak on behalf of sig-autoscaling though.

@davidspek
Copy link
Contributor Author

Regarding the overlap with VPA, there is definitely some overlap, but I believe part of that is just scripts which can be deduplicated in the future. @James-QiuHaoran can likely provider better comment here as he implemented the original PR.

In terms of maintenance, that will always be a difficult question to answer at the start of something new. I revived this PR since there was an approved AEP but the implementation was stuck in a PR. Personally, I think the Multi-dimensional Pod Autoscaler could be a very popular project since it solve the difficulty people have with the inherent coupling of resource requests and the HPA. Having the MPA could make the lives of platform teams much easier. So I am hopeful that getting it included and having people start using it would have the project gain traction.

@adrianmoisey
Copy link
Member

In terms of maintenance, that will always be a difficult question to answer at the start of something new. I revived this PR since there was an approved AEP but the implementation was stuck in a PR. Personally, I think the Multi-dimensional Pod Autoscaler could be a very popular project since it solve the difficulty people have with the inherent coupling of resource requests and the HPA. Having the MPA could make the lives of platform teams much easier. So I am hopeful that getting it included and having people start using it would have the project gain traction.

I think this is worth bring up in the sig-autoscaling meeting, just to see how others feel about it

@raywainman
Copy link
Contributor

Thanks a lot David!

I'm having trouble loading all the files in this PR :( Wonder if it is possible to split out the vendor files into a different PR?

I'd really love to dig into the logic of how the vertical and horizontal autoscaling recommendations are generated and actuated since the original AEP skims over this part a bit.

+1 to Adrian, we should discuss this in the SIG meeting - if anything to raise awareness and gauge additional interest of folks who might want to use this or help out!

Some random thoughts on maintenance:

  • We could de-risk this causing additional burden on VPA by releasing it as a separate "thing" (for lack of a better term here). Although architecturally it might be nice to keep them combined. Would love to discuss this.
  • This could live in a "beta/alpha" state for a while during development to gauge interest. This would remove any kind of expectation around maintenance from users?

@raywainman
Copy link
Contributor

I'm having trouble loading all the files in this PR :( Wonder if it is possible to split out the vendor files into a different PR?

Even temporarily while we look at the core logic, then we can bring vendored files back in afterwards.

@adrianmoisey
Copy link
Member

I'm having trouble loading all the files in this PR :( Wonder if it is possible to split out the vendor files into a different PR

This would be very helpful!

I'd really love to dig into the logic of how the vertical and horizontal autoscaling recommendations are generated and actuated since the original AEP skims over this part a bit.

I'm also going to focus on this

This could live in a "beta/alpha" state for a while during development to gauge interest. This would remove any kind of expectation around maintenance from users?

I was wondering about this too. Putting it into this repo may make it "official", so calling it out as experimental (at least for now) may help

One thing I did notice so far is that this doesn't seem to have e2e tests. I think it makes sense to try get decent test coverage for a new tool, especially if we're low on maintainers.

@davidspek
Copy link
Contributor Author

Thanks for the replies. I think we're all on the same line here. My intention was also to get this contribution into the mix to be able to properly gauge interest, which is difficult without it actually being a thing that is released and people can play with. At the same time, it most definitely should be released as an alpha or beta without any guarantees with very clear messaging on this.

I also don't think this should be burdened on the maintainers of the VPA, and should be released independently. However, if this proves to be popular it might make sense to do a joined refactor with the VPA so more code can be reused and forces can be joined. Personally, I could see a world where the MPA becomes more popular than VPA, since it also does horizontal scaling which is the most popular form of autoscaling.

In terms of end-to-end testing, I agree that is something which definitely should be added. However, as I fear my time for this will probably be quite lacking and writing good tests takes some time, I hope it would be possible to have this also be separated from the initial contribution to hopefully get the ball rolling a bit.

I've also removed the vendored files so it should be easier to review the PR now.

Comment on lines 111 to 117
err = api_util.CreateOrUpdateMpaCheckpoint(writer.vpaCheckpointClient.VerticalPodAutoscalerCheckpoints(mpa.ID.Namespace), &vpaCheckpoint)
if err != nil {
klog.Errorf("Cannot save MPA %s/%s checkpoint for %s. Reason: %+v",
mpa.ID.Namespace, vpaCheckpoint.Spec.VPAObjectName, vpaCheckpoint.Spec.ContainerName, err)
} else {
klog.V(3).Infof("Saved MPA %s/%s checkpoint for %s",
mpa.ID.Namespace, vpaCheckpoint.Spec.VPAObjectName, vpaCheckpoint.Spec.ContainerName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This depends on the VPA's CRDs to be installed in the cluster.
Is this intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually looking at this myself as well and something I tried to verify but was running into an issue with KinD where I could register the CRD nicely. The MPA was already working in that it scaled up the pod, but maybe it was working incorrectly. Do you think it makes sense to have a separate checkpoint CRD for the MPA? I'm also not sure if other code that is getting imported from the VPA might require the VPA checkpoint to be used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a project depend on another project's CRDs may create an awkward dependency between the two projects.

I'm not sure on the best way forward, may be for now it's worth noting that the MPA depends on having the VPA CRDs installed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking it should be possible to import the type of the VPA checkpoint as the MPA checkpoint and generate it that way. I'll give that a quick try in a moment. If at some point the MPA and VPA start converging more it might make sense to just have the one resource.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now added the mpacheckpoint CRD and should have the code migrated to that. This should allow somewhat cleaner separation, at least from a deployment standpoint.

@davidspek davidspek force-pushed the mpa branch 2 times, most recently from a579fdb to a4feb37 Compare December 10, 2024 12:35
Signed-off-by: David van der Spek <[email protected]>
Signed-off-by: David van der Spek <[email protected]>
Signed-off-by: David van der Spek <[email protected]>
Signed-off-by: David van der Spek <[email protected]>
Signed-off-by: David van der Spek <[email protected]>
Signed-off-by: David van der Spek <[email protected]>
Signed-off-by: David van der Spek <[email protected]>
Signed-off-by: David van der Spek <[email protected]>
Signed-off-by: David van der Spek <[email protected]>
Signed-off-by: David van der Spek <[email protected]>
Signed-off-by: David van der Spek <[email protected]>
Signed-off-by: David van der Spek <[email protected]>
Signed-off-by: David van der Spek <[email protected]>
Signed-off-by: David van der Spek <[email protected]>
Signed-off-by: David van der Spek <[email protected]>
Signed-off-by: David van der Spek <[email protected]>
Signed-off-by: David van der Spek <[email protected]>
Signed-off-by: David van der Spek <[email protected]>
Comment on lines +40 to +43
// GetHumanizeMemory returns the value of the HumanizeMemory flag.
func GetHumanizeMemory() bool {
return *humanizeMemory
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious what the purpose of this change is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/vertical-pod-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants