Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The managed-by annotation should be removed for AWS deployments #668

Open
a13x5 opened this issue Aug 7, 2024 · 7 comments
Open

The managed-by annotation should be removed for AWS deployments #668

a13x5 opened this issue Aug 7, 2024 · 7 comments

Comments

@a13x5
Copy link

a13x5 commented Aug 7, 2024

The cluster.x-k8s.io/managed-by: k0smotron annotation is required as it described in the k0somotron documentation

Where it’s explicitly says that:

This marks the base infra to be self managed. The value of the annotation is irrelevant, as long as there is a value.

In CAPI docs this annotation is explained as “that some external system is managing the cluster infrastructure“. In this case this means that k0smotron should be responsible for AWS resources creation, which it doesn’t do.

And Cluster API AWS provider just skipping reconcile and creation of all AWS resources. Workers will not be created until we manually set the .status.ready field in AWSCluster to true. Certain resources, (like public IPs) however are still dependent on proper AWSCluster reconcile. Thus workers with public IPs will not be created.

This behavior significantly complicates deployment, since it's certain parts of automation are disabled. More so with the annotation removed all works just fine.

What was a purpose of adding it to the docs? It should be removed completely if not pose significant drawbacks

@jnummelin
Copy link
Member

That annotation is/was used as there was a 🐔 and 🥚 problem with CAPA provider. When we let it to create all the resources, which includes ELB for the CP, it also populates the controlPlaneEndpoint field. And as that is marked as immutable, k0smotron was/is not able to override it with the hosted controlplane address (LB service).

More so with the annotation removed all works just fine.

This is surprising, to me at least. We did quite extensive testing with AWS in the early days and always hit the above mentioned problem. Maybe they've changed somthing in CAPA provider side. Did you test with both hosted controlplanes and CPs using Machines?

@a13x5
Copy link
Author

a13x5 commented Aug 9, 2024

Thanks for the answer!

Yes, now I see what you're talking about.

CAPA controller tries to create LB and fails to do so, because it wants to update LB field which was already filled by k0smotron

E0809 21:30:35.544488       1 controller.go:329] "Reconciler error" err="failed to patch AWSCluster default/kk-1: admission webhook \"validation.awscluster.infrastructure.cluster.x-k8s.io\" denied the request: A
WSCluster.infrastructure.cluster.x-k8s.io \"kk-1\" is invalid: spec.controlPlaneEndpoint: Invalid value: kk-1-apiserver-92222222.us-west-1.elb.amazonaws.com:6443: field is immutable" controller="awscluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSCluster" AWSCluster="default/kk-1" namespace="default" name="kk-1" reconcileID="19e1f45c-a166-45b4-a558-dabd58a9f81e"

And then just marks cluster as defunct somehow

E0809 21:59:12.328636       1 controller.go:329] "Reconciler error" err="no loadbalancer exists for the AWSCluster kk-1, the cluster has become unrecoverable and should be deleted manually: no classic load balancer found with name: \"kk-1-apiserver\"" controller="awscluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSCluster" AWSCluster="default/kk-1" namespace="default" name="kk-1" reconcileID="c8d8257c-4a1b-404c-85b6-faaca324cad3"

But machines are actually created in my case and I don't see any extra LBs, so it looks like it all works from the first glance.

I overlooked the fact that AWS provider don't like what's going on there. And it looks more like CAPA related problem here.

Did you research a possibility for proper fix? In CAPA perhaps?
Because right now status patch workaround breaks declarative deployment flow.

@a13x5
Copy link
Author

a13x5 commented Aug 9, 2024

Also I forgot to add:
Since we don't have any finalizers on AWSCluster resource when it is deployed with the annotation it get deleted instantly.
This is a big problem, since AWSMachines get stuck in this case, because awsmachine_controller expects that AWSCluster is present.

Thus to create cluster we need manually patch status and to delete it we must manually handle all AWSMachine finalizers.

@jnummelin
Copy link
Member

Did you research a possibility for proper fix? In CAPA perhaps?

I've had a quick look but unfortunately have not had time to go deep enough to provide an actual fix there. 😢

@a13x5
Copy link
Author

a13x5 commented Aug 26, 2024

@jnummelin I recently tested Azure provider (CAPZ) and we have similar situation with it as well.

The main difference between the two is that CAPZ is not misbehaving trying to update AzureCluster object. It just creates extra LB and all connected resources and then continuing work as usual.

Given the fact that CAPZ don't have any option to disable LB creation (as well as CAPA) will you consider handling it on k0smotron side?

@Schnitzel
Copy link
Contributor

talked about this in the k0smotron office working hours:

  • this seems to be an issue of not only CAPA but also CAPZ, possibly other CAPI providers
  • we see three ways to solve this
  1. provide a way in CAPA, CAPZ, etc. to tell the provider to not create any loadbalancers in the case of hosted controlplanes
  2. Implement a solution in CAPI which then will be used by all Providers

@a13x5
Copy link
Author

a13x5 commented Oct 2, 2024

I created kubernetes-sigs/cluster-api-provider-aws#5130 in CAPA upstream.
@jnummelin FYI If you want something to add there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants