-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to restore an etcd backup / import a pre-existing cluster #805
Comments
What's the overall process how you are testing this? I mean is the statefulset running with 3 replicas the whole time, how are oyu scaling it etc? |
I scale it to 0, for which I also have to stop the kmc controller(s) btw, then I copied an etcd snapshot into a PVC which I mount together with the PVC of etcd-0 into a pod, in which I run the aforementioned command. Then I stop the pod, scale the statefulset to 1, wait until it is ready and then scale it to 2. BTW, I also tried restoring the snapshot on all nodes, with increasing initial-cluster settings (only node 0 for node 0, node 0,1 for node 1 and node 0,1,2 for node 2). But this already doesn't work with node 1, as the cluster id somehow mismatches 😅 Haven't found a way to get this working either 😅 |
Try scaling the K0smotronCluster to 1 node, then scale it up. |
Ah, yes! That's it! Thank you 👌 Now I only have the problem, that the CNI doesn't start as it can't connect to the apiServer (Failed to connect to 10.96.0.1 port 443 after 109 ms: No route to host), but I'm still investigating that |
I thought that's because the konnectivity pods weren't running, as they only run when the CNI is up But to try I let them run inside the hostnetwork, which successfully started them, but that didn't help |
I got the cluster started by setting the external API server IP and port for the CNI, but shouldn't it work without that? kube-proxy looks to be correctly configured, has the external API server IP and is running. The cluster I started from nothing works perfectly |
I suspect, that it's related to the old IP addresses. What is the output of |
Yes, it's the same I'm currently using to access the API
coredns: cloud-controller: cilium-operator is just printing: |
And the individual cilium agents log the following, tho this might be because the cilium-operator is down
|
Ah, the endpoints were correct, but the endpointslices weren't I dunno why the endpoint had |
Ok, so most stuff seems to be working, only the webhooks aren't 😅
recreating them didn't help |
How can I import an etcd snapshot?
I tried mounting the etcd-0 PVC and restore the snapshot with
SVC_NAME=kmc-1111-test-cwr-ffm3-2207-etcd; HOSTNAME=kmc-1111-test-cwr-ffm3-2207-etcd-0; ETCDCTL_API=3 /opt/bitnami/etcd/bin/etcdutl snapshot restore /tmp/db --data-dir /var/lib/k0s/etcd --skip-hash-check=true --name=$HOSTNAME --initial-cluster=$HOSTNAME=https://${HOSTNAME}.${SVC_NAME}:2380 --initial-advertise-peer-urls=https://${HOSTNAME}.${SVC_NAME}:2380
which works by itself and the etcd successfully startsBut etcd-1 cannot join, erroring with
btw, this also crashes etcd-0, although I don't have the logs right now, but I can reproduce this 100%
The text was updated successfully, but these errors were encountered: