Merge pull request #5297 from twz123/docs-regenerate-ca

Add troubleshooting section on regenerating CAs
k0sproject · Dec 10, 2024 · 537f1bb · 537f1bb
2 parents b341fc1 + 8924155
commit 537f1bb
Show file tree

Hide file tree

Showing 9 changed files with 120 additions and 10 deletions.
diff --git a/docs/custom-ca.md b/docs/custom-ca.md
@@ -38,3 +38,7 @@ Here's an example of a command for pre-generating a token for a controller.
 ```shell
 k0s token pre-shared --role controller --cert /var/lib/k0s/pki/ca.crt --url https://<controller-ip>:9443/
 ```
+
+## See also
+
+- [Certificate Authorities](troubleshooting/certificate-authorities.md)
diff --git a/docs/k0s-multi-node.md b/docs/k0s-multi-node.md
@@ -64,7 +64,7 @@ To get a token, run the following command on one of the existing controller node
 sudo k0s token create --role=worker
 ```
 
-The resulting output is a long [token](#about-tokens) string, which you can use to add a worker to the cluster.
+The resulting output is a long [token](#about-join-tokens) string, which you can use to add a worker to the cluster.
 
 For enhanced security, run the following command to set an expiration time for the token:
 
@@ -84,7 +84,7 @@ sudo k0s install worker --token-file /path/to/token/file
 sudo k0s start
 ```
 
-#### About tokens
+#### About join tokens
 
 The join tokens are base64-encoded [kubeconfigs](https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/) for several reasons:
 

diff --git a/docs/runtime.md b/docs/runtime.md
@@ -266,7 +266,7 @@ metrics][cadvisor-metrics] when using cri-dockerd.
 [install cri-dockerd]: https://github.com/Mirantis/cri-dockerd#using-cri-dockerd
 [worker profiles]: worker-node-config.md#worker-profiles
 [dynamic configuration]: dynamic-configuration.md
-[cadvisor-metrics]: ./troubleshooting.md#using-a-custom-container-runtime-and-missing-labels-in-prometheus-metrics
+[cadvisor-metrics]: ./troubleshooting/troubleshooting.md#using-a-custom-container-runtime-and-missing-labels-in-prometheus-metrics
 
 #### Verification
 

diff --git a/docs/FAQ.md → docs/troubleshooting/FAQ.md b/docs/FAQ.md → docs/troubleshooting/FAQ.md
@@ -31,3 +31,12 @@ As a default, the control plane does not run kubelet at all, and will not accept
 ## Is k0sproject really open source?
 
 Yes, k0sproject is 100% open source. The source code is under Apache 2 and the documentation is under the Creative Commons License. Mirantis, Inc. is the main contributor and sponsor for this OSS project: building all the binaries from upstream, performing necessary security scans and calculating checksums so that it's easy and safe to use. The use of these ready-made binaries are subject to Mirantis EULA and the binaries include only open source software.
+
+## A kubeconfig created via [`k0s kubeconfig`](../cli/k0s_kubeconfig.md) has been leaked, what can I do?
+
+Kubernetes does not support certificate revocation (see [k/k/18982]). This means
+that you cannot disable the leaked credentials. The only way to effectively
+revoke them is to [replace the Kubernetes CA] for your cluster.
+
+[k/k/18982]: https://github.com/kubernetes/kubernetes/issues/18982
+[replace the Kubernetes CA]: certificate-authorities.md#replacing-the-kubernetes-ca-and-sa-key-pair
diff --git a/docs/troubleshooting/certificate-authorities.md b/docs/troubleshooting/certificate-authorities.md
@@ -0,0 +1,96 @@
+# Certificate Authorities (CAs)
+
+## Overview of CAs managed by k0s
+
+k0s maintains two Certificate Authorities and one public/private key pair:
+
+* The **Kubernetes CA** is used to secure the Kubernetes cluster and manage
+  client and server certificates for API communication.
+* The **etcd CA** is used only when managed etcd is enabled, for securing etcd
+  communications.
+* The **Kubernetes Service Account (SA) key pair** is used for signing
+  Kubernetes [service account tokens].
+
+These CAs are automatically created during cluster initialization and have a
+default expiration period of 10 years. They are distributed once to all k0s
+controllers as part of k0s's [join process]. Replacing them is a manual process,
+as k0s currently lacks automation for CA renewal.
+
+[service account tokens]: https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/
+[join process]: ../k0s-multi-node.md#5-add-controllers-to-the-cluster
+
+## Replacing the Kubernetes CA and SA key pair
+
+The following steps describe a way how to manually replace the Kubernetes CA and
+SA key pair by taking a cluster down, regenerating those and redistributing them
+to all nodes, and then bringing the cluster back online:
+
+1. Take a [backup]! Things might go wrong at any level.
+
+2. Stop k0s on all worker and controller nodes. All the instructions below
+   assume that all k0s nodes are using the default data directory
+   `/var/lib/k0s`. Please adjust accordingly if you're using a different data
+   directory path.
+
+3. Delete the Kubernetes CA and SA key pair files from the all the controller
+   data directories:
+
+   * `/var/lib/k0s/pki/ca.crt`
+   * `/var/lib/k0s/pki/ca.key`
+   * `/var/lib/k0s/pki/sa.pub`
+   * `/var/lib/k0s/pki/sa.key`
+
+   Delete the kubelet's kubeconfig file and the kubelet's PKI directory from all
+   worker data directories. Note that this includes controllers that have been
+   started with the `--enable-worker` flag:
+
+   * `/var/lib/k0s/kubelet.conf`
+   * `/var/lib/k0s/kubelet/pki`
+
+4. Choose one controller as the "first" one. Restart k0s on the first
+   controller. If this controller is running with the `--enable-worker` flag,
+   you should **reboot the machine** instead. This will ensure that all
+   processes and pods will be cleanly restarted. After the restart, k0s will
+   have regenerated a new Kubernetes CA and SA key pair.
+
+5. Distribute the new CA and SA key pair to the other controllers: Copy over the
+   following files from the first controller to each of the remaining
+   controllers:
+
+   * `/var/lib/k0s/pki/ca.crt`
+   * `/var/lib/k0s/pki/ca.key`
+   * `/var/lib/k0s/pki/sa.pub`
+   * `/var/lib/k0s/pki/sa.key`
+
+   After copying the files, the new CA and SA key pair are in place. Restart k0s
+   on the other controllers. For controllers running with the `--enable-worker`
+   flag, **reboot the machines** instead.
+
+6. Rejoin all workers. The easiest way to do this is to use a
+   `kubelet-bootstrap.conf` file. You can [generate](../cli/k0s_token_create.md)
+   such a file on a controller like this (see the section on [join tokens] for
+   details):
+
+   ```sh
+   touch /tmp/rejoin-token &&
+     chmod 0600 /tmp/rejoin-token &&
+     k0s token create --expiry 1h |
+     base64 -d |
+     gunzip >/tmp/rejoin-token
+   ```
+
+   Copy that token to each worker node and place it at
+   `/var/lib/k0s/kubelet-bootstrap.conf`. Then reboot the machine.
+
+7. When all workers are back online, the `kubelet-bootstrap.conf` files can be
+   safely removed from the workers. You can also invalidate the token so you
+   don't have to wait for it to expire: Use [`k0s token list --role
+   worker`](../cli/k0s_token_list.md) to list all tokens and [`k0s token
+   invalidate <token-id>`](../cli/k0s_token_invalidate.md) to invalidate them immediately.
+
+[backup]: ../backup.md
+[join tokens]: ../k0s-multi-node.md#about-join-tokens
+
+## See also
+
+* [Install using custom CAs](../custom-ca.md)
diff --git a/docs/logs.md → docs/troubleshooting/logs.md b/docs/logs.md → docs/troubleshooting/logs.md
diff --git a/docs/support-dump.md → docs/troubleshooting/support-dump.md b/docs/support-dump.md → docs/troubleshooting/support-dump.md
@@ -1,6 +1,6 @@
 # Support Insight
 
-In many cases, especially when looking for [commercial support](commercial-support.md) there's a need for share the cluster state with other people.
+In many cases, especially when looking for [commercial support](../commercial-support.md) there's a need for share the cluster state with other people.
 While one could always give access to the live cluster that is not always desired nor even possible.
 
 For those kind of cases we can lean on the work our friends at [troubleshoot.sh](https://troubleshoot.sh) have done.

diff --git a/docs/troubleshooting.md → docs/troubleshooting/troubleshooting.md b/docs/troubleshooting.md → docs/troubleshooting/troubleshooting.md
@@ -67,7 +67,7 @@ io.containerd.snapshotter.v1    zfs                      linux/amd64    ok
 ...
 ```
 
-- create a containerd config according to the [documentation](runtime.md): `$ containerd config default > /etc/k0s/containerd.toml`
+- create a containerd config according to the [documentation](../runtime.md): `$ containerd config default > /etc/k0s/containerd.toml`
 - modify the line in `/etc/k0s/containerd.toml`:
 
 ```toml
@@ -92,7 +92,7 @@ to
 
 ## Pods pending when using cloud providers
 
-Once we enable [cloud provider support](cloud-providers.md) on kubelet on worker nodes, kubelet will automatically add a taint `node.cloudprovider.kubernetes.io/uninitialized` for the node. This tain will prevent normal workloads to be scheduled on the node until the cloud provider controller actually runs second initialization on the node and removes the taint. This means that these nodes are not available for scheduling until the cloud provider controller is actually successfully running on the cluster.
+Once we enable [cloud provider support](../cloud-providers.md) on kubelet on worker nodes, kubelet will automatically add a taint `node.cloudprovider.kubernetes.io/uninitialized` for the node. This tain will prevent normal workloads to be scheduled on the node until the cloud provider controller actually runs second initialization on the node and removes the taint. This means that these nodes are not available for scheduling until the cloud provider controller is actually successfully running on the cluster.
 
 For troubleshooting your specific cloud provider see its documentation.
 

diff --git a/mkdocs.yml b/mkdocs.yml
@@ -68,10 +68,11 @@ nav:
       - GitOps with Flux: examples/gitops-flux.md
       - OpenEBS storage: examples/openebs.md
   - Troubleshooting:
-      - FAQ: FAQ.md
-      - Logs: logs.md
-      - Common Pitfalls: troubleshooting.md
-      - Support Insights: support-dump.md
+      - FAQ: troubleshooting/FAQ.md
+      - Logs: troubleshooting/logs.md
+      - Common Pitfalls: troubleshooting/troubleshooting.md
+      - Support Insights: troubleshooting/support-dump.md
+      - Certificate Authorities (CAs): troubleshooting/certificate-authorities.md
   - Reference:
       - Architecture: architecture/index.md
       - Command Line: cli/README.md