How to restore an etcd backup / import a pre-existing cluster #805

cwrau · 2024-11-06T15:57:56Z

How can I import an etcd snapshot?

I tried mounting the etcd-0 PVC and restore the snapshot with SVC_NAME=kmc-1111-test-cwr-ffm3-2207-etcd; HOSTNAME=kmc-1111-test-cwr-ffm3-2207-etcd-0; ETCDCTL_API=3 /opt/bitnami/etcd/bin/etcdutl snapshot restore /tmp/db --data-dir /var/lib/k0s/etcd --skip-hash-check=true --name=$HOSTNAME --initial-cluster=$HOSTNAME=https://${HOSTNAME}.${SVC_NAME}:2380 --initial-advertise-peer-urls=https://${HOSTNAME}.${SVC_NAME}:2380 which works by itself and the etcd successfully starts

But etcd-1 cannot join, erroring with

+ kmc-1111-test-cwr-ffm3-2207-etcd-1 › init
kmc-1111-test-cwr-ffm3-2207-etcd-1 init Checking if cluster is functional
kmc-1111-test-cwr-ffm3-2207-etcd-1 init fa49d9dd1d55bc89, started, kmc-1111-test-cwr-ffm3-2207-etcd-0, https://kmc-1111-test-cwr-ffm3-2207-etcd-0.kmc-1111-test-cwr-ffm3-2207-etcd:2380, https://kmc-1111-test-cwr-ffm3-2207-etcd-0.kmc-1111-test-cwr-ffm3-2207-etcd:2379, false
kmc-1111-test-cwr-ffm3-2207-etcd-1 init Cluster is functional
kmc-1111-test-cwr-ffm3-2207-etcd-1 init Member 53b0d356f4b5adbd added to cluster   629eef8d065839
kmc-1111-test-cwr-ffm3-2207-etcd-1 init 
kmc-1111-test-cwr-ffm3-2207-etcd-1 init ETCD_NAME="kmc-1111-test-cwr-ffm3-2207-etcd-1"
kmc-1111-test-cwr-ffm3-2207-etcd-1 init ETCD_INITIAL_CLUSTER="kmc-1111-test-cwr-ffm3-2207-etcd-1=https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2380,kmc-1111-test-cwr-ffm3-2207-etcd-0=https://kmc-1111-test-cwr-ffm3-2207-etcd-0.kmc-1111-test-cwr-ffm3-2207-etcd:2380"
kmc-1111-test-cwr-ffm3-2207-etcd-1 init ETCD_INITIAL_ADVERTISE_PEER_URLS="https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2380"
kmc-1111-test-cwr-ffm3-2207-etcd-1 init ETCD_INITIAL_CLUSTER_STATE="existing"
- kmc-1111-test-cwr-ffm3-2207-etcd-1 › init
+ kmc-1111-test-cwr-ffm3-2207-etcd-1 › etcd
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:30.325718Z","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_INITIAL_CLUSTER","variable-value":"kmc-1111-test-cwr-ffm3-2207-etcd-0=https://kmc-1111-test-cwr-ffm3-2207-etcd-0.kmc-1111-test-cwr-ffm3-2207-etcd:2380,kmc-1111-test-cwr-ffm3-2207-etcd-1=https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2380,kmc-1111-test-cwr-ffm3-2207-etcd-2=https://kmc-1111-test-cwr-ffm3-2207-etcd-2.kmc-1111-test-cwr-ffm3-2207-etcd:2380"}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:30.325953Z","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_INITIAL_CLUSTER_STATE","variable-value":"existing"}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"warn","ts":"2024-11-06T15:40:30.326129Z","caller":"embed/config.go:679","msg":"Running http and grpc server on single port. This is not recommended for production."}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:30.326222Z","caller":"etcdmain/etcd.go:73","msg":"Running: ","args":["etcd","--name","kmc-1111-test-cwr-ffm3-2207-etcd-1","--listen-peer-urls=https://0.0.0.0:2380","--listen-client-urls=https://0.0.0.0:2379","--advertise-client-urls=https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2379","--initial-advertise-peer-urls=https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2380","--client-cert-auth=true","--tls-min-version=TLS1.2","--trusted-ca-file=/var/lib/k0s/pki/etcd/ca.crt","--cert-file=/var/lib/k0s/pki/etcd/server.crt","--key-file=/var/lib/k0s/pki/etcd/server.key","--peer-trusted-ca-file=/var/lib/k0s/pki/etcd/ca.crt","--peer-key-file=/var/lib/k0s/pki/etcd/peer.key","--peer-cert-file=/var/lib/k0s/pki/etcd/peer.crt","--peer-client-cert-auth=true","--enable-pprof=false","--auto-compaction-mode=periodic","--auto-compaction-retention=5m","--snapshot-count=10000","--data-dir=/var/lib/k0s/etcd"]}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"warn","ts":"2024-11-06T15:40:30.326379Z","caller":"etcdmain/etcd.go:446","msg":"found invalid file under data directory","filename":"existing","data-dir":"/var/lib/k0s/etcd"}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"warn","ts":"2024-11-06T15:40:30.326462Z","caller":"etcdmain/etcd.go:446","msg":"found invalid file under data directory","filename":"lost+found","data-dir":"/var/lib/k0s/etcd"}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:30.326504Z","caller":"etcdmain/etcd.go:116","msg":"server has been already initialized","data-dir":"/var/lib/k0s/etcd","dir-type":"member"}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"warn","ts":"2024-11-06T15:40:30.326575Z","caller":"embed/config.go:679","msg":"Running http and grpc server on single port. This is not recommended for production."}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:30.326627Z","caller":"embed/etcd.go:127","msg":"configuring peer listeners","listen-peer-urls":["https://0.0.0.0:2380"]}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:30.326704Z","caller":"embed/etcd.go:494","msg":"starting with peer TLS","tls-info":"cert = /var/lib/k0s/pki/etcd/peer.crt, key = /var/lib/k0s/pki/etcd/peer.key, client-cert=, client-key=, trusted-ca = /var/lib/k0s/pki/etcd/ca.crt, client-cert-auth = true, crl-file = ","cipher-suites":[]}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:30.327704Z","caller":"embed/etcd.go:135","msg":"configuring client listeners","listen-client-urls":["https://0.0.0.0:2379"]}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:30.328033Z","caller":"embed/etcd.go:308","msg":"starting an etcd server","etcd-version":"3.5.13","git-sha":"c9063a0","go-version":"go1.21.9","go-os":"linux","go-arch":"amd64","max-cpu-set":4,"max-cpu-available":4,"member-initialized":false,"name":"kmc-1111-test-cwr-ffm3-2207-etcd-1","data-dir":"/var/lib/k0s/etcd","wal-dir":"","wal-dir-dedicated":"","member-dir":"/var/lib/k0s/etcd/member","force-new-cluster":false,"heartbeat-interval":"100ms","election-timeout":"1s","initial-election-tick-advance":true,"snapshot-count":10000,"max-wals":5,"max-snapshots":5,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2380"],"listen-peer-urls":["https://0.0.0.0:2380"],"advertise-client-urls":["https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2379"],"listen-client-urls":["https://0.0.0.0:2379"],"listen-metrics-urls":[],"cors":["*"],"host-whitelist":["*"],"initial-cluster":"kmc-1111-test-cwr-ffm3-2207-etcd-0=https://kmc-1111-test-cwr-ffm3-2207-etcd-0.kmc-1111-test-cwr-ffm3-2207-etcd:2380,kmc-1111-test-cwr-ffm3-2207-etcd-1=https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2380,kmc-1111-test-cwr-ffm3-2207-etcd-2=https://kmc-1111-test-cwr-ffm3-2207-etcd-2.kmc-1111-test-cwr-ffm3-2207-etcd:2380","initial-cluster-state":"existing","initial-cluster-token":"etcd-cluster","quota-backend-bytes":2147483648,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","compact-check-time-enabled":false,"compact-check-time-interval":"1m0s","auto-compaction-mode":"periodic","auto-compaction-retention":"5m0s","auto-compaction-interval":"5m0s","discovery-url":"","discovery-proxy":"","downgrade-check-interval":"5s"}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"warn","ts":"2024-11-06T15:40:30.328194Z","caller":"fileutil/fileutil.go:53","msg":"check file permission","error":"directory \"/var/lib/k0s/etcd\" exist, but the permission is \"dgrwxrwxr-x\". The recommended permission is \"-rwx------\" to prevent possible unprivileged access to the data"}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"warn","ts":"2024-11-06T15:40:30.329632Z","caller":"fileutil/fileutil.go:53","msg":"check file permission","error":"directory \"/var/lib/k0s/etcd/member/snap\" exist, but the permission is \"dgrwxrwx---\". The recommended permission is \"-rwx------\" to prevent possible unprivileged access to the data"}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:30.342112Z","caller":"etcdserver/backend.go:81","msg":"opened backend db","path":"/var/lib/k0s/etcd/member/snap/db","took":"11.104987ms"}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:30.381738Z","caller":"embed/etcd.go:375","msg":"closing etcd server","name":"kmc-1111-test-cwr-ffm3-2207-etcd-1","data-dir":"/var/lib/k0s/etcd","advertise-peer-urls":["https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2380"],"advertise-client-urls":["https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2379"]}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:30.381854Z","caller":"embed/etcd.go:377","msg":"closed etcd server","name":"kmc-1111-test-cwr-ffm3-2207-etcd-1","data-dir":"/var/lib/k0s/etcd","advertise-peer-urls":["https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2380"],"advertise-client-urls":["https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2379"]}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"fatal","ts":"2024-11-06T15:40:30.381889Z","caller":"etcdmain/etcd.go:204","msg":"discovery failed","error":"error validating peerURLs {ClusterID:629eef8d065839 Members:[&{ID:fa49d9dd1d55bc89 RaftAttributes:{PeerURLs:[https://kmc-1111-test-cwr-ffm3-2207-etcd-0.kmc-1111-test-cwr-ffm3-2207-etcd:2380] IsLearner:false} Attributes:{Name:kmc-1111-test-cwr-ffm3-2207-etcd-0 ClientURLs:[https://kmc-1111-test-cwr-ffm3-2207-etcd-0.kmc-1111-test-cwr-ffm3-2207-etcd:2379]}} &{ID:53b0d356f4b5adbd RaftAttributes:{PeerURLs:[https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2380] IsLearner:false} Attributes:{Name: ClientURLs:[]}}] RemovedMemberIDs:[]}: member count is unequal","stacktrace":"go.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\tgo.etcd.io/etcd/server/v3/etcdmain/etcd.go:204\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\tgo.etcd.io/etcd/server/v3/etcdmain/main.go:40\nmain.main\n\tgo.etcd.io/etcd/server/v3/main.go:31\nruntime.main\n\truntime/proc.go:267"}
- kmc-1111-test-cwr-ffm3-2207-etcd-1 › etcd
+ kmc-1111-test-cwr-ffm3-2207-etcd-1 › etcd
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:31.315274Z","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_INITIAL_CLUSTER","variable-value":"kmc-1111-test-cwr-ffm3-2207-etcd-0=https://kmc-1111-test-cwr-ffm3-2207-etcd-0.kmc-1111-test-cwr-ffm3-2207-etcd:2380,kmc-1111-test-cwr-ffm3-2207-etcd-1=https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2380,kmc-1111-test-cwr-ffm3-2207-etcd-2=https://kmc-1111-test-cwr-ffm3-2207-etcd-2.kmc-1111-test-cwr-ffm3-2207-etcd:2380"}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:31.315443Z","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_INITIAL_CLUSTER_STATE","variable-value":"existing"}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"warn","ts":"2024-11-06T15:40:31.315681Z","caller":"embed/config.go:679","msg":"Running http and grpc server on single port. This is not recommended for production."}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:31.315727Z","caller":"etcdmain/etcd.go:73","msg":"Running: ","args":["etcd","--name","kmc-1111-test-cwr-ffm3-2207-etcd-1","--listen-peer-urls=https://0.0.0.0:2380","--listen-client-urls=https://0.0.0.0:2379","--advertise-client-urls=https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2379","--initial-advertise-peer-urls=https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2380","--client-cert-auth=true","--tls-min-version=TLS1.2","--trusted-ca-file=/var/lib/k0s/pki/etcd/ca.crt","--cert-file=/var/lib/k0s/pki/etcd/server.crt","--key-file=/var/lib/k0s/pki/etcd/server.key","--peer-trusted-ca-file=/var/lib/k0s/pki/etcd/ca.crt","--peer-key-file=/var/lib/k0s/pki/etcd/peer.key","--peer-cert-file=/var/lib/k0s/pki/etcd/peer.crt","--peer-client-cert-auth=true","--enable-pprof=false","--auto-compaction-mode=periodic","--auto-compaction-retention=5m","--snapshot-count=10000","--data-dir=/var/lib/k0s/etcd"]}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"warn","ts":"2024-11-06T15:40:31.315936Z","caller":"etcdmain/etcd.go:446","msg":"found invalid file under data directory","filename":"existing","data-dir":"/var/lib/k0s/etcd"}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"warn","ts":"2024-11-06T15:40:31.315968Z","caller":"etcdmain/etcd.go:446","msg":"found invalid file under data directory","filename":"lost+found","data-dir":"/var/lib/k0s/etcd"}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:31.316006Z","caller":"etcdmain/etcd.go:116","msg":"server has been already initialized","data-dir":"/var/lib/k0s/etcd","dir-type":"member"}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"warn","ts":"2024-11-06T15:40:31.316039Z","caller":"embed/config.go:679","msg":"Running http and grpc server on single port. This is not recommended for production."}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:31.316051Z","caller":"embed/etcd.go:127","msg":"configuring peer listeners","listen-peer-urls":["https://0.0.0.0:2380"]}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:31.316133Z","caller":"embed/etcd.go:494","msg":"starting with peer TLS","tls-info":"cert = /var/lib/k0s/pki/etcd/peer.crt, key = /var/lib/k0s/pki/etcd/peer.key, client-cert=, client-key=, trusted-ca = /var/lib/k0s/pki/etcd/ca.crt, client-cert-auth = true, crl-file = ","cipher-suites":[]}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:31.317559Z","caller":"embed/etcd.go:135","msg":"configuring client listeners","listen-client-urls":["https://0.0.0.0:2379"]}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:31.317812Z","caller":"embed/etcd.go:308","msg":"starting an etcd server","etcd-version":"3.5.13","git-sha":"c9063a0","go-version":"go1.21.9","go-os":"linux","go-arch":"amd64","max-cpu-set":4,"max-cpu-available":4,"member-initialized":false,"name":"kmc-1111-test-cwr-ffm3-2207-etcd-1","data-dir":"/var/lib/k0s/etcd","wal-dir":"","wal-dir-dedicated":"","member-dir":"/var/lib/k0s/etcd/member","force-new-cluster":false,"heartbeat-interval":"100ms","election-timeout":"1s","initial-election-tick-advance":true,"snapshot-count":10000,"max-wals":5,"max-snapshots":5,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2380"],"listen-peer-urls":["https://0.0.0.0:2380"],"advertise-client-urls":["https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2379"],"listen-client-urls":["https://0.0.0.0:2379"],"listen-metrics-urls":[],"cors":["*"],"host-whitelist":["*"],"initial-cluster":"kmc-1111-test-cwr-ffm3-2207-etcd-0=https://kmc-1111-test-cwr-ffm3-2207-etcd-0.kmc-1111-test-cwr-ffm3-2207-etcd:2380,kmc-1111-test-cwr-ffm3-2207-etcd-1=https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2380,kmc-1111-test-cwr-ffm3-2207-etcd-2=https://kmc-1111-test-cwr-ffm3-2207-etcd-2.kmc-1111-test-cwr-ffm3-2207-etcd:2380","initial-cluster-state":"existing","initial-cluster-token":"etcd-cluster","quota-backend-bytes":2147483648,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","compact-check-time-enabled":false,"compact-check-time-interval":"1m0s","auto-compaction-mode":"periodic","auto-compaction-retention":"5m0s","auto-compaction-interval":"5m0s","discovery-url":"","discovery-proxy":"","downgrade-check-interval":"5s"}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"warn","ts":"2024-11-06T15:40:31.317945Z","caller":"fileutil/fileutil.go:53","msg":"check file permission","error":"directory \"/var/lib/k0s/etcd\" exist, but the permission is \"dgrwxrwxr-x\". The recommended permission is \"-rwx------\" to prevent possible unprivileged access to the data"}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"warn","ts":"2024-11-06T15:40:31.318273Z","caller":"fileutil/fileutil.go:53","msg":"check file permission","error":"directory \"/var/lib/k0s/etcd/member/snap\" exist, but the permission is \"dgrwxrwx---\". The recommended permission is \"-rwx------\" to prevent possible unprivileged access to the data"}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:31.320052Z","caller":"etcdserver/backend.go:81","msg":"opened backend db","path":"/var/lib/k0s/etcd/member/snap/db","took":"1.53205ms"}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:31.341364Z","caller":"embed/etcd.go:375","msg":"closing etcd server","name":"kmc-1111-test-cwr-ffm3-2207-etcd-1","data-dir":"/var/lib/k0s/etcd","advertise-peer-urls":["https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2380"],"advertise-client-urls":["https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2379"]}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:31.34162Z","caller":"embed/etcd.go:377","msg":"closed etcd server","name":"kmc-1111-test-cwr-ffm3-2207-etcd-1","data-dir":"/var/lib/k0s/etcd","advertise-peer-urls":["https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2380"],"advertise-client-urls":["https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2379"]}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"fatal","ts":"2024-11-06T15:40:31.341746Z","caller":"etcdmain/etcd.go:204","msg":"discovery failed","error":"error validating peerURLs {ClusterID:629eef8d065839 Members:[&{ID:53b0d356f4b5adbd RaftAttributes:{PeerURLs:[https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2380] IsLearner:false} Attributes:{Name: ClientURLs:[]}} &{ID:fa49d9dd1d55bc89 RaftAttributes:{PeerURLs:[https://kmc-1111-test-cwr-ffm3-2207-etcd-0.kmc-1111-test-cwr-ffm3-2207-etcd:2380] IsLearner:false} Attributes:{Name:kmc-1111-test-cwr-ffm3-2207-etcd-0 ClientURLs:[https://kmc-1111-test-cwr-ffm3-2207-etcd-0.kmc-1111-test-cwr-ffm3-2207-etcd:2379]}}] RemovedMemberIDs:[]}: member count is unequal","stacktrace":"go.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\tgo.etcd.io/etcd/server/v3/etcdmain/etcd.go:204\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\tgo.etcd.io/etcd/server/v3/etcdmain/main.go:40\nmain.main\n\tgo.etcd.io/etcd/server/v3/main.go:31\nruntime.main\n\truntime/proc.go:267"}
- kmc-1111-test-cwr-ffm3-2207-etcd-1 › etcd

btw, this also crashes etcd-0, although I don't have the logs right now, but I can reproduce this 100%

The text was updated successfully, but these errors were encountered:

jnummelin · 2024-11-07T09:24:46Z

What's the overall process how you are testing this? I mean is the statefulset running with 3 replicas the whole time, how are oyu scaling it etc?

cwrau · 2024-11-07T11:05:46Z

What's the overall process how you are testing this? I mean is the statefulset running with 3 replicas the whole time, how are oyu scaling it etc?

I scale it to 0, for which I also have to stop the kmc controller(s) btw, then

I copied an etcd snapshot into a PVC which I mount together with the PVC of etcd-0 into a pod, in which I run the aforementioned command.

Then I stop the pod, scale the statefulset to 1, wait until it is ready and then scale it to 2.

BTW, I also tried restoring the snapshot on all nodes, with increasing initial-cluster settings (only node 0 for node 0, node 0,1 for node 1 and node 0,1,2 for node 2). But this already doesn't work with node 1, as the cluster id somehow mismatches 😅

Haven't found a way to get this working either 😅

makhov · 2024-11-08T08:47:27Z

Try scaling the K0smotronCluster to 1 node, then scale it up.
The problem with scaling only etcd statefulset is that etcd relies on the ETCD_INITIAL_CLUSTER var, which is set to 3 nodes now. k0smotron updates etcd sts and all the vars gracefully during the scaling process to avoid this kind of issue.

cwrau · 2024-11-08T10:14:52Z

Try scaling the K0smotronCluster to 1 node, then scale it up. The problem with scaling only etcd statefulset is that etcd relies on the ETCD_INITIAL_CLUSTER var, which is set to 3 nodes now. k0smotron updates etcd sts and all the vars gracefully during the scaling process to avoid this kind of issue.

Ah, yes! That's it! Thank you 👌

Now I only have the problem, that the CNI doesn't start as it can't connect to the apiServer (Failed to connect to 10.96.0.1 port 443 after 109 ms: No route to host), but I'm still investigating that

cwrau · 2024-11-08T10:25:46Z

I thought that's because the konnectivity pods weren't running, as they only run when the CNI is up

But to try I let them run inside the hostnetwork, which successfully started them, but that didn't help

cwrau · 2024-11-08T10:50:36Z

I got the cluster started by setting the external API server IP and port for the CNI, but shouldn't it work without that? kube-proxy looks to be correctly configured, has the external API server IP and is running.

The cluster I started from nothing works perfectly

makhov · 2024-11-11T12:59:37Z

I suspect, that it's related to the old IP addresses. What is the output of kubectl get ep kubernetes? Is it up to date?
And what is in the logs of failing pods?

cwrau · 2024-11-12T08:56:36Z

I suspect, that it's related to the old IP addresses. What is the output of kubectl get ep kubernetes?

NAME         ENDPOINTS             AGE
kubernetes   $newIP:6443   475d

Is it up to date?

Yes, it's the same I'm currently using to access the API

And what is in the logs of failing pods?

coredns: dial tcp 10.96.0.1:443: connect: connection refused

cloud-controller: dial tcp 10.96.0.1:443: connect: no route to host

cilium-operator is just printing: Establishing connection to apiserver" host="https://10.96.0.1:443", no explicit error message

cwrau · 2024-11-12T09:47:11Z

And the individual cilium agents log the following, tho this might be because the cilium-operator is down

time="2024-11-12T09:44:53Z" level=warning msg="Failed to get identity labels for endpoint" error="failed to wait for initial global identities: initial global identity sync was cancelled: context deadline exceeded" k8sEndpointName=coredns-6478896bb8-4cxhd k8sNamespace=kube-system k8sUID=0712803c-e43d-4fc2-9d51-506d33f187f5 subsys=egressgateway

cwrau · 2024-11-12T11:12:05Z

Ah, the endpoints were correct, but the endpointslices weren't

I dunno why the endpoint had endpointslice.kubernetes.io/skip-mirror: "true" but I guess that prevented that from being fixed automatically

cwrau · 2024-11-12T11:19:58Z

Ok, so most stuff seems to be working, only the webhooks aren't 😅

Internal error occurred: failed calling webhook \"webhook.cert-manager.io\": failed to call webhook: Post \"https://cert-manager-webhook.cert-manager.svc:443/validate?timeout=30s\": dial tcp 10.102.124.131:443: i/o timeout

recreating them didn't help

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to restore an etcd backup / import a pre-existing cluster #805

How to restore an etcd backup / import a pre-existing cluster #805

cwrau commented Nov 6, 2024

jnummelin commented Nov 7, 2024

cwrau commented Nov 7, 2024

makhov commented Nov 8, 2024 •

edited

Loading

cwrau commented Nov 8, 2024 •

edited

Loading

cwrau commented Nov 8, 2024 •

edited

Loading

cwrau commented Nov 8, 2024

makhov commented Nov 11, 2024

cwrau commented Nov 12, 2024 •

edited

Loading

cwrau commented Nov 12, 2024

cwrau commented Nov 12, 2024

cwrau commented Nov 12, 2024

How to restore an etcd backup / import a pre-existing cluster #805

How to restore an etcd backup / import a pre-existing cluster #805

Comments

cwrau commented Nov 6, 2024

jnummelin commented Nov 7, 2024

cwrau commented Nov 7, 2024

makhov commented Nov 8, 2024 • edited Loading

cwrau commented Nov 8, 2024 • edited Loading

cwrau commented Nov 8, 2024 • edited Loading

cwrau commented Nov 8, 2024

makhov commented Nov 11, 2024

cwrau commented Nov 12, 2024 • edited Loading

cwrau commented Nov 12, 2024

cwrau commented Nov 12, 2024

cwrau commented Nov 12, 2024

makhov commented Nov 8, 2024 •

edited

Loading

cwrau commented Nov 8, 2024 •

edited

Loading

cwrau commented Nov 8, 2024 •

edited

Loading

cwrau commented Nov 12, 2024 •

edited

Loading