Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to restore an etcd backup / import a pre-existing cluster #805

Open
cwrau opened this issue Nov 6, 2024 · 11 comments
Open

How to restore an etcd backup / import a pre-existing cluster #805

cwrau opened this issue Nov 6, 2024 · 11 comments

Comments

@cwrau
Copy link
Contributor

cwrau commented Nov 6, 2024

How can I import an etcd snapshot?

I tried mounting the etcd-0 PVC and restore the snapshot with SVC_NAME=kmc-1111-test-cwr-ffm3-2207-etcd; HOSTNAME=kmc-1111-test-cwr-ffm3-2207-etcd-0; ETCDCTL_API=3 /opt/bitnami/etcd/bin/etcdutl snapshot restore /tmp/db --data-dir /var/lib/k0s/etcd --skip-hash-check=true --name=$HOSTNAME --initial-cluster=$HOSTNAME=https://${HOSTNAME}.${SVC_NAME}:2380 --initial-advertise-peer-urls=https://${HOSTNAME}.${SVC_NAME}:2380 which works by itself and the etcd successfully starts

But etcd-1 cannot join, erroring with

+ kmc-1111-test-cwr-ffm3-2207-etcd-1 › init
kmc-1111-test-cwr-ffm3-2207-etcd-1 init Checking if cluster is functional
kmc-1111-test-cwr-ffm3-2207-etcd-1 init fa49d9dd1d55bc89, started, kmc-1111-test-cwr-ffm3-2207-etcd-0, https://kmc-1111-test-cwr-ffm3-2207-etcd-0.kmc-1111-test-cwr-ffm3-2207-etcd:2380, https://kmc-1111-test-cwr-ffm3-2207-etcd-0.kmc-1111-test-cwr-ffm3-2207-etcd:2379, false
kmc-1111-test-cwr-ffm3-2207-etcd-1 init Cluster is functional
kmc-1111-test-cwr-ffm3-2207-etcd-1 init Member 53b0d356f4b5adbd added to cluster   629eef8d065839
kmc-1111-test-cwr-ffm3-2207-etcd-1 init 
kmc-1111-test-cwr-ffm3-2207-etcd-1 init ETCD_NAME="kmc-1111-test-cwr-ffm3-2207-etcd-1"
kmc-1111-test-cwr-ffm3-2207-etcd-1 init ETCD_INITIAL_CLUSTER="kmc-1111-test-cwr-ffm3-2207-etcd-1=https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2380,kmc-1111-test-cwr-ffm3-2207-etcd-0=https://kmc-1111-test-cwr-ffm3-2207-etcd-0.kmc-1111-test-cwr-ffm3-2207-etcd:2380"
kmc-1111-test-cwr-ffm3-2207-etcd-1 init ETCD_INITIAL_ADVERTISE_PEER_URLS="https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2380"
kmc-1111-test-cwr-ffm3-2207-etcd-1 init ETCD_INITIAL_CLUSTER_STATE="existing"
- kmc-1111-test-cwr-ffm3-2207-etcd-1 › init
+ kmc-1111-test-cwr-ffm3-2207-etcd-1 › etcd
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:30.325718Z","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_INITIAL_CLUSTER","variable-value":"kmc-1111-test-cwr-ffm3-2207-etcd-0=https://kmc-1111-test-cwr-ffm3-2207-etcd-0.kmc-1111-test-cwr-ffm3-2207-etcd:2380,kmc-1111-test-cwr-ffm3-2207-etcd-1=https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2380,kmc-1111-test-cwr-ffm3-2207-etcd-2=https://kmc-1111-test-cwr-ffm3-2207-etcd-2.kmc-1111-test-cwr-ffm3-2207-etcd:2380"}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:30.325953Z","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_INITIAL_CLUSTER_STATE","variable-value":"existing"}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"warn","ts":"2024-11-06T15:40:30.326129Z","caller":"embed/config.go:679","msg":"Running http and grpc server on single port. This is not recommended for production."}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:30.326222Z","caller":"etcdmain/etcd.go:73","msg":"Running: ","args":["etcd","--name","kmc-1111-test-cwr-ffm3-2207-etcd-1","--listen-peer-urls=https://0.0.0.0:2380","--listen-client-urls=https://0.0.0.0:2379","--advertise-client-urls=https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2379","--initial-advertise-peer-urls=https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2380","--client-cert-auth=true","--tls-min-version=TLS1.2","--trusted-ca-file=/var/lib/k0s/pki/etcd/ca.crt","--cert-file=/var/lib/k0s/pki/etcd/server.crt","--key-file=/var/lib/k0s/pki/etcd/server.key","--peer-trusted-ca-file=/var/lib/k0s/pki/etcd/ca.crt","--peer-key-file=/var/lib/k0s/pki/etcd/peer.key","--peer-cert-file=/var/lib/k0s/pki/etcd/peer.crt","--peer-client-cert-auth=true","--enable-pprof=false","--auto-compaction-mode=periodic","--auto-compaction-retention=5m","--snapshot-count=10000","--data-dir=/var/lib/k0s/etcd"]}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"warn","ts":"2024-11-06T15:40:30.326379Z","caller":"etcdmain/etcd.go:446","msg":"found invalid file under data directory","filename":"existing","data-dir":"/var/lib/k0s/etcd"}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"warn","ts":"2024-11-06T15:40:30.326462Z","caller":"etcdmain/etcd.go:446","msg":"found invalid file under data directory","filename":"lost+found","data-dir":"/var/lib/k0s/etcd"}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:30.326504Z","caller":"etcdmain/etcd.go:116","msg":"server has been already initialized","data-dir":"/var/lib/k0s/etcd","dir-type":"member"}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"warn","ts":"2024-11-06T15:40:30.326575Z","caller":"embed/config.go:679","msg":"Running http and grpc server on single port. This is not recommended for production."}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:30.326627Z","caller":"embed/etcd.go:127","msg":"configuring peer listeners","listen-peer-urls":["https://0.0.0.0:2380"]}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:30.326704Z","caller":"embed/etcd.go:494","msg":"starting with peer TLS","tls-info":"cert = /var/lib/k0s/pki/etcd/peer.crt, key = /var/lib/k0s/pki/etcd/peer.key, client-cert=, client-key=, trusted-ca = /var/lib/k0s/pki/etcd/ca.crt, client-cert-auth = true, crl-file = ","cipher-suites":[]}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:30.327704Z","caller":"embed/etcd.go:135","msg":"configuring client listeners","listen-client-urls":["https://0.0.0.0:2379"]}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:30.328033Z","caller":"embed/etcd.go:308","msg":"starting an etcd server","etcd-version":"3.5.13","git-sha":"c9063a0","go-version":"go1.21.9","go-os":"linux","go-arch":"amd64","max-cpu-set":4,"max-cpu-available":4,"member-initialized":false,"name":"kmc-1111-test-cwr-ffm3-2207-etcd-1","data-dir":"/var/lib/k0s/etcd","wal-dir":"","wal-dir-dedicated":"","member-dir":"/var/lib/k0s/etcd/member","force-new-cluster":false,"heartbeat-interval":"100ms","election-timeout":"1s","initial-election-tick-advance":true,"snapshot-count":10000,"max-wals":5,"max-snapshots":5,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2380"],"listen-peer-urls":["https://0.0.0.0:2380"],"advertise-client-urls":["https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2379"],"listen-client-urls":["https://0.0.0.0:2379"],"listen-metrics-urls":[],"cors":["*"],"host-whitelist":["*"],"initial-cluster":"kmc-1111-test-cwr-ffm3-2207-etcd-0=https://kmc-1111-test-cwr-ffm3-2207-etcd-0.kmc-1111-test-cwr-ffm3-2207-etcd:2380,kmc-1111-test-cwr-ffm3-2207-etcd-1=https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2380,kmc-1111-test-cwr-ffm3-2207-etcd-2=https://kmc-1111-test-cwr-ffm3-2207-etcd-2.kmc-1111-test-cwr-ffm3-2207-etcd:2380","initial-cluster-state":"existing","initial-cluster-token":"etcd-cluster","quota-backend-bytes":2147483648,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","compact-check-time-enabled":false,"compact-check-time-interval":"1m0s","auto-compaction-mode":"periodic","auto-compaction-retention":"5m0s","auto-compaction-interval":"5m0s","discovery-url":"","discovery-proxy":"","downgrade-check-interval":"5s"}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"warn","ts":"2024-11-06T15:40:30.328194Z","caller":"fileutil/fileutil.go:53","msg":"check file permission","error":"directory \"/var/lib/k0s/etcd\" exist, but the permission is \"dgrwxrwxr-x\". The recommended permission is \"-rwx------\" to prevent possible unprivileged access to the data"}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"warn","ts":"2024-11-06T15:40:30.329632Z","caller":"fileutil/fileutil.go:53","msg":"check file permission","error":"directory \"/var/lib/k0s/etcd/member/snap\" exist, but the permission is \"dgrwxrwx---\". The recommended permission is \"-rwx------\" to prevent possible unprivileged access to the data"}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:30.342112Z","caller":"etcdserver/backend.go:81","msg":"opened backend db","path":"/var/lib/k0s/etcd/member/snap/db","took":"11.104987ms"}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:30.381738Z","caller":"embed/etcd.go:375","msg":"closing etcd server","name":"kmc-1111-test-cwr-ffm3-2207-etcd-1","data-dir":"/var/lib/k0s/etcd","advertise-peer-urls":["https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2380"],"advertise-client-urls":["https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2379"]}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:30.381854Z","caller":"embed/etcd.go:377","msg":"closed etcd server","name":"kmc-1111-test-cwr-ffm3-2207-etcd-1","data-dir":"/var/lib/k0s/etcd","advertise-peer-urls":["https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2380"],"advertise-client-urls":["https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2379"]}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"fatal","ts":"2024-11-06T15:40:30.381889Z","caller":"etcdmain/etcd.go:204","msg":"discovery failed","error":"error validating peerURLs {ClusterID:629eef8d065839 Members:[&{ID:fa49d9dd1d55bc89 RaftAttributes:{PeerURLs:[https://kmc-1111-test-cwr-ffm3-2207-etcd-0.kmc-1111-test-cwr-ffm3-2207-etcd:2380] IsLearner:false} Attributes:{Name:kmc-1111-test-cwr-ffm3-2207-etcd-0 ClientURLs:[https://kmc-1111-test-cwr-ffm3-2207-etcd-0.kmc-1111-test-cwr-ffm3-2207-etcd:2379]}} &{ID:53b0d356f4b5adbd RaftAttributes:{PeerURLs:[https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2380] IsLearner:false} Attributes:{Name: ClientURLs:[]}}] RemovedMemberIDs:[]}: member count is unequal","stacktrace":"go.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\tgo.etcd.io/etcd/server/v3/etcdmain/etcd.go:204\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\tgo.etcd.io/etcd/server/v3/etcdmain/main.go:40\nmain.main\n\tgo.etcd.io/etcd/server/v3/main.go:31\nruntime.main\n\truntime/proc.go:267"}
- kmc-1111-test-cwr-ffm3-2207-etcd-1 › etcd
+ kmc-1111-test-cwr-ffm3-2207-etcd-1 › etcd
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:31.315274Z","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_INITIAL_CLUSTER","variable-value":"kmc-1111-test-cwr-ffm3-2207-etcd-0=https://kmc-1111-test-cwr-ffm3-2207-etcd-0.kmc-1111-test-cwr-ffm3-2207-etcd:2380,kmc-1111-test-cwr-ffm3-2207-etcd-1=https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2380,kmc-1111-test-cwr-ffm3-2207-etcd-2=https://kmc-1111-test-cwr-ffm3-2207-etcd-2.kmc-1111-test-cwr-ffm3-2207-etcd:2380"}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:31.315443Z","caller":"flags/flag.go:113","msg":"recognized and used environment variable","variable-name":"ETCD_INITIAL_CLUSTER_STATE","variable-value":"existing"}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"warn","ts":"2024-11-06T15:40:31.315681Z","caller":"embed/config.go:679","msg":"Running http and grpc server on single port. This is not recommended for production."}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:31.315727Z","caller":"etcdmain/etcd.go:73","msg":"Running: ","args":["etcd","--name","kmc-1111-test-cwr-ffm3-2207-etcd-1","--listen-peer-urls=https://0.0.0.0:2380","--listen-client-urls=https://0.0.0.0:2379","--advertise-client-urls=https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2379","--initial-advertise-peer-urls=https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2380","--client-cert-auth=true","--tls-min-version=TLS1.2","--trusted-ca-file=/var/lib/k0s/pki/etcd/ca.crt","--cert-file=/var/lib/k0s/pki/etcd/server.crt","--key-file=/var/lib/k0s/pki/etcd/server.key","--peer-trusted-ca-file=/var/lib/k0s/pki/etcd/ca.crt","--peer-key-file=/var/lib/k0s/pki/etcd/peer.key","--peer-cert-file=/var/lib/k0s/pki/etcd/peer.crt","--peer-client-cert-auth=true","--enable-pprof=false","--auto-compaction-mode=periodic","--auto-compaction-retention=5m","--snapshot-count=10000","--data-dir=/var/lib/k0s/etcd"]}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"warn","ts":"2024-11-06T15:40:31.315936Z","caller":"etcdmain/etcd.go:446","msg":"found invalid file under data directory","filename":"existing","data-dir":"/var/lib/k0s/etcd"}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"warn","ts":"2024-11-06T15:40:31.315968Z","caller":"etcdmain/etcd.go:446","msg":"found invalid file under data directory","filename":"lost+found","data-dir":"/var/lib/k0s/etcd"}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:31.316006Z","caller":"etcdmain/etcd.go:116","msg":"server has been already initialized","data-dir":"/var/lib/k0s/etcd","dir-type":"member"}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"warn","ts":"2024-11-06T15:40:31.316039Z","caller":"embed/config.go:679","msg":"Running http and grpc server on single port. This is not recommended for production."}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:31.316051Z","caller":"embed/etcd.go:127","msg":"configuring peer listeners","listen-peer-urls":["https://0.0.0.0:2380"]}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:31.316133Z","caller":"embed/etcd.go:494","msg":"starting with peer TLS","tls-info":"cert = /var/lib/k0s/pki/etcd/peer.crt, key = /var/lib/k0s/pki/etcd/peer.key, client-cert=, client-key=, trusted-ca = /var/lib/k0s/pki/etcd/ca.crt, client-cert-auth = true, crl-file = ","cipher-suites":[]}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:31.317559Z","caller":"embed/etcd.go:135","msg":"configuring client listeners","listen-client-urls":["https://0.0.0.0:2379"]}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:31.317812Z","caller":"embed/etcd.go:308","msg":"starting an etcd server","etcd-version":"3.5.13","git-sha":"c9063a0","go-version":"go1.21.9","go-os":"linux","go-arch":"amd64","max-cpu-set":4,"max-cpu-available":4,"member-initialized":false,"name":"kmc-1111-test-cwr-ffm3-2207-etcd-1","data-dir":"/var/lib/k0s/etcd","wal-dir":"","wal-dir-dedicated":"","member-dir":"/var/lib/k0s/etcd/member","force-new-cluster":false,"heartbeat-interval":"100ms","election-timeout":"1s","initial-election-tick-advance":true,"snapshot-count":10000,"max-wals":5,"max-snapshots":5,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2380"],"listen-peer-urls":["https://0.0.0.0:2380"],"advertise-client-urls":["https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2379"],"listen-client-urls":["https://0.0.0.0:2379"],"listen-metrics-urls":[],"cors":["*"],"host-whitelist":["*"],"initial-cluster":"kmc-1111-test-cwr-ffm3-2207-etcd-0=https://kmc-1111-test-cwr-ffm3-2207-etcd-0.kmc-1111-test-cwr-ffm3-2207-etcd:2380,kmc-1111-test-cwr-ffm3-2207-etcd-1=https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2380,kmc-1111-test-cwr-ffm3-2207-etcd-2=https://kmc-1111-test-cwr-ffm3-2207-etcd-2.kmc-1111-test-cwr-ffm3-2207-etcd:2380","initial-cluster-state":"existing","initial-cluster-token":"etcd-cluster","quota-backend-bytes":2147483648,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","compact-check-time-enabled":false,"compact-check-time-interval":"1m0s","auto-compaction-mode":"periodic","auto-compaction-retention":"5m0s","auto-compaction-interval":"5m0s","discovery-url":"","discovery-proxy":"","downgrade-check-interval":"5s"}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"warn","ts":"2024-11-06T15:40:31.317945Z","caller":"fileutil/fileutil.go:53","msg":"check file permission","error":"directory \"/var/lib/k0s/etcd\" exist, but the permission is \"dgrwxrwxr-x\". The recommended permission is \"-rwx------\" to prevent possible unprivileged access to the data"}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"warn","ts":"2024-11-06T15:40:31.318273Z","caller":"fileutil/fileutil.go:53","msg":"check file permission","error":"directory \"/var/lib/k0s/etcd/member/snap\" exist, but the permission is \"dgrwxrwx---\". The recommended permission is \"-rwx------\" to prevent possible unprivileged access to the data"}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:31.320052Z","caller":"etcdserver/backend.go:81","msg":"opened backend db","path":"/var/lib/k0s/etcd/member/snap/db","took":"1.53205ms"}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:31.341364Z","caller":"embed/etcd.go:375","msg":"closing etcd server","name":"kmc-1111-test-cwr-ffm3-2207-etcd-1","data-dir":"/var/lib/k0s/etcd","advertise-peer-urls":["https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2380"],"advertise-client-urls":["https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2379"]}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"info","ts":"2024-11-06T15:40:31.34162Z","caller":"embed/etcd.go:377","msg":"closed etcd server","name":"kmc-1111-test-cwr-ffm3-2207-etcd-1","data-dir":"/var/lib/k0s/etcd","advertise-peer-urls":["https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2380"],"advertise-client-urls":["https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2379"]}
kmc-1111-test-cwr-ffm3-2207-etcd-1 etcd {"level":"fatal","ts":"2024-11-06T15:40:31.341746Z","caller":"etcdmain/etcd.go:204","msg":"discovery failed","error":"error validating peerURLs {ClusterID:629eef8d065839 Members:[&{ID:53b0d356f4b5adbd RaftAttributes:{PeerURLs:[https://kmc-1111-test-cwr-ffm3-2207-etcd-1.kmc-1111-test-cwr-ffm3-2207-etcd:2380] IsLearner:false} Attributes:{Name: ClientURLs:[]}} &{ID:fa49d9dd1d55bc89 RaftAttributes:{PeerURLs:[https://kmc-1111-test-cwr-ffm3-2207-etcd-0.kmc-1111-test-cwr-ffm3-2207-etcd:2380] IsLearner:false} Attributes:{Name:kmc-1111-test-cwr-ffm3-2207-etcd-0 ClientURLs:[https://kmc-1111-test-cwr-ffm3-2207-etcd-0.kmc-1111-test-cwr-ffm3-2207-etcd:2379]}}] RemovedMemberIDs:[]}: member count is unequal","stacktrace":"go.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\tgo.etcd.io/etcd/server/v3/etcdmain/etcd.go:204\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\tgo.etcd.io/etcd/server/v3/etcdmain/main.go:40\nmain.main\n\tgo.etcd.io/etcd/server/v3/main.go:31\nruntime.main\n\truntime/proc.go:267"}
- kmc-1111-test-cwr-ffm3-2207-etcd-1 › etcd

btw, this also crashes etcd-0, although I don't have the logs right now, but I can reproduce this 100%

@jnummelin
Copy link
Member

What's the overall process how you are testing this? I mean is the statefulset running with 3 replicas the whole time, how are oyu scaling it etc?

@cwrau
Copy link
Contributor Author

cwrau commented Nov 7, 2024

What's the overall process how you are testing this? I mean is the statefulset running with 3 replicas the whole time, how are oyu scaling it etc?

I scale it to 0, for which I also have to stop the kmc controller(s) btw, then

I copied an etcd snapshot into a PVC which I mount together with the PVC of etcd-0 into a pod, in which I run the aforementioned command.

Then I stop the pod, scale the statefulset to 1, wait until it is ready and then scale it to 2.


BTW, I also tried restoring the snapshot on all nodes, with increasing initial-cluster settings (only node 0 for node 0, node 0,1 for node 1 and node 0,1,2 for node 2). But this already doesn't work with node 1, as the cluster id somehow mismatches 😅

Haven't found a way to get this working either 😅

@makhov
Copy link
Contributor

makhov commented Nov 8, 2024

Try scaling the K0smotronCluster to 1 node, then scale it up.
The problem with scaling only etcd statefulset is that etcd relies on the ETCD_INITIAL_CLUSTER var, which is set to 3 nodes now. k0smotron updates etcd sts and all the vars gracefully during the scaling process to avoid this kind of issue.

@cwrau
Copy link
Contributor Author

cwrau commented Nov 8, 2024

Try scaling the K0smotronCluster to 1 node, then scale it up. The problem with scaling only etcd statefulset is that etcd relies on the ETCD_INITIAL_CLUSTER var, which is set to 3 nodes now. k0smotron updates etcd sts and all the vars gracefully during the scaling process to avoid this kind of issue.

Ah, yes! That's it! Thank you 👌

Now I only have the problem, that the CNI doesn't start as it can't connect to the apiServer (Failed to connect to 10.96.0.1 port 443 after 109 ms: No route to host), but I'm still investigating that

@cwrau
Copy link
Contributor Author

cwrau commented Nov 8, 2024

I thought that's because the konnectivity pods weren't running, as they only run when the CNI is up

But to try I let them run inside the hostnetwork, which successfully started them, but that didn't help

@cwrau
Copy link
Contributor Author

cwrau commented Nov 8, 2024

I got the cluster started by setting the external API server IP and port for the CNI, but shouldn't it work without that? kube-proxy looks to be correctly configured, has the external API server IP and is running.

The cluster I started from nothing works perfectly

@makhov
Copy link
Contributor

makhov commented Nov 11, 2024

I suspect, that it's related to the old IP addresses. What is the output of kubectl get ep kubernetes? Is it up to date?
And what is in the logs of failing pods?

@cwrau
Copy link
Contributor Author

cwrau commented Nov 12, 2024

I suspect, that it's related to the old IP addresses. What is the output of kubectl get ep kubernetes?

NAME         ENDPOINTS             AGE
kubernetes   $newIP:6443   475d

Is it up to date?

Yes, it's the same I'm currently using to access the API

And what is in the logs of failing pods?

coredns: dial tcp 10.96.0.1:443: connect: connection refused

cloud-controller: dial tcp 10.96.0.1:443: connect: no route to host

cilium-operator is just printing: Establishing connection to apiserver" host="https://10.96.0.1:443", no explicit error message

@cwrau
Copy link
Contributor Author

cwrau commented Nov 12, 2024

And the individual cilium agents log the following, tho this might be because the cilium-operator is down

time="2024-11-12T09:44:53Z" level=warning msg="Failed to get identity labels for endpoint" error="failed to wait for initial global identities: initial global identity sync was cancelled: context deadline exceeded" k8sEndpointName=coredns-6478896bb8-4cxhd k8sNamespace=kube-system k8sUID=0712803c-e43d-4fc2-9d51-506d33f187f5 subsys=egressgateway

@cwrau
Copy link
Contributor Author

cwrau commented Nov 12, 2024

Ah, the endpoints were correct, but the endpointslices weren't

I dunno why the endpoint had endpointslice.kubernetes.io/skip-mirror: "true" but I guess that prevented that from being fixed automatically

@cwrau
Copy link
Contributor Author

cwrau commented Nov 12, 2024

Ok, so most stuff seems to be working, only the webhooks aren't 😅

Internal error occurred: failed calling webhook \"webhook.cert-manager.io\": failed to call webhook: Post \"https://cert-manager-webhook.cert-manager.svc:443/validate?timeout=30s\": dial tcp 10.102.124.131:443: i/o timeout

recreating them didn't help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants