Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ETCD PKI provisioning Issue in Karmada Operator #5975

Closed
jabellard opened this issue Dec 25, 2024 · 14 comments · Fixed by #5976
Closed

Fix ETCD PKI provisioning Issue in Karmada Operator #5975

jabellard opened this issue Dec 25, 2024 · 14 comments · Fixed by #5976
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.
Milestone

Comments

@jabellard
Copy link
Contributor

What happened:
When provisioning a Karmada instance to be managed by the Karmada operator, there's support for both in-cluster and external ETCD modes. When external ETCD mode is used, the operator should skip setting up an in-cluster ETCD instance, which entails setting up PKI for the instance and deploying it as a stateful set. However, currently, in external ETCD mode, the operator will skip deploying the stateful set, but will still generate PKI, which is an issue. In external ETCD mode, all tasks relevant to setting up an in-cluster instance should be skipped.

What you expected to happen:
In external ETCD mode, all tasks relevant to setting up an in-cluster ETCD instance, including setting up PKI for the instance, should be skipped.

How to reproduce it (as minimally and precisely as possible):
Try to provision a new control plane using an external ETCD instance via the Karmada operator.

Anything else we need to know?:

Environment:

  • Karmada version: v1.12
  • kubectl-karmada or karmadactl version (the result of kubectl-karmada version or karmadactl version): N/A
  • Others: N/A
@jabellard
Copy link
Contributor Author

Hey @RainbowMango . I just submitted a PR for this. Can you please take a look?

@RainbowMango
Copy link
Member

Sure, I am working on it.

@RainbowMango
Copy link
Member

However, currently, in external ETCD mode, the operator will skip deploying the stateful set, but will still generate PKI, which is an issue. In external ETCD mode, all tasks relevant to setting up an in-cluster instance should be skipped.

I agree that in external ETCD mode, we don't need to generate PKI for in-cluster ETCD.
But, just out of curiosity, would generating PKI bring any serious problems?

@RainbowMango
Copy link
Member

/assign @jabellard
In favor of #5976

@RainbowMango RainbowMango added this to the v1.13 milestone Dec 26, 2024
@jabellard
Copy link
Contributor Author

However, currently, in external ETCD mode, the operator will skip deploying the stateful set, but will still generate PKI, which is an issue. In external ETCD mode, all tasks relevant to setting up an in-cluster instance should be skipped.

I agree that in external ETCD mode, we don't need to generate PKI for in-cluster ETCD. But, just out of curiosity, would generating PKI bring any serious problems?

Great question. The name of the secret that holds the cert for the external ETCD instance must adhere to a convention that's enforced by the operator. If we don't skip generating PKI for an in-cluster instance in external mode, then the operator will overwrite the data in that secret with data for the cert it needlessly generates.

@RainbowMango
Copy link
Member

Oh, that sounds bad. That means people can not use the external ETCD when provisioning a Karmada instance, isn't it?

@jabellard
Copy link
Contributor Author

Oh, that sounds bad. That means people can not use the external ETCD when provisioning a Karmada instance, isn't it?

Yeah. That's the issue.

@RainbowMango
Copy link
Member

That's surprising, this issue was supposed to be discovered earlier when we implemented this feature at #5242.
Don't get me wrong, I'm saying that didn't mean to blame anyone, just want to know what we missed during the previous development process.

@RainbowMango
Copy link
Member

By the way, are you back from Christmas vacation? I know a lot of guys will take 1~2 weeks off.

@jabellard
Copy link
Contributor Author

That's surprising, this issue was supposed to be discovered earlier when we implemented this feature at #5242. Don't get me wrong, I'm saying that didn't mean to blame anyone, just want to know what we missed during the previous development process.

I agree. I was also surprised to see that. E2E testing worked perfectly fine before we imposed the naming convention at some point for the external secret so that it has the exact same name as the secret that would be generated in in-cluster mode. I think this slipped through around that time.

@jabellard
Copy link
Contributor Author

By the way, are you back from Christmas vacation? I know a lot of guys will take 1~2 weeks off.

Found some free time today to do some coding. Will be back from vacation first week of next year.

@RainbowMango
Copy link
Member

E2E testing worked perfectly fine before we imposed the naming convention at some point for the external secret so that it has the exact same name as the secret that would be generated in in-cluster mode. I think this slipped through around that time.

OK. I will investigate it further and then back to #5976. I want to know exactly which PR breaks this.

@zhzhuang-zju
Copy link
Contributor

@jabellard I combs through this issue and please correct me if there is anything incorrect or incomplete.

In #5699, the API ExternalEtcd.SecretRef was introduced, at which point the naming requirement for the secret name is {{karmada-name}}-karmada-apiserver-etcd-client-cert.
In #5720, the relevant capability was implemented, but the naming requirement for the secret name was changed to {{karmada-name}}-etcd-cert, which is the same as the name of the local ETCD secret automatically generated by the operator.
In the task NewUploadCertsTask, automatically generated certificates will be uploaded to the corresponding secret, see

func runUploadEtcdCert(r workflow.RunData) error {
data, ok := r.(InitData)
if !ok {
return errors.New("upload-etcdCert task invoked with an invalid data struct")
}
ca := data.GetCert(constants.EtcdCaCertAndKeyName)
server := data.GetCert(constants.EtcdServerCertAndKeyName)
client := data.GetCert(constants.EtcdClientCertAndKeyName)
err := apiclient.CreateOrUpdateSecret(data.RemoteClient(), &corev1.Secret{
ObjectMeta: metav1.ObjectMeta{
Namespace: data.GetNamespace(),
Name: util.EtcdCertSecretName(data.GetName()),
Labels: constants.KarmadaOperatorLabel,

// EtcdCertSecretName returns secret name of etcd cert
func EtcdCertSecretName(karmada string) string {
return generateResourceName(karmada, "etcd-cert")
}

This results in the content of the secret specified by ExternalEtcd.SecretRef being overwritten. Furthermore, because the task NewUploadCertsTask occurs relatively early in the process, the ETCD certificates actually used by subsequent component initializations are the ones automatically generated by the operator.

@RainbowMango
Copy link
Member

Thanks @zhzhuang-zju for the clarification. The issue was leaked due to a lack of E2E test coverage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

3 participants