Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k0s kubectl connection fails after upgrading to K0s v1.31.2 #5350

Open
4 tasks done
jsalgado78 opened this issue Dec 11, 2024 · 9 comments
Open
4 tasks done

k0s kubectl connection fails after upgrading to K0s v1.31.2 #5350

jsalgado78 opened this issue Dec 11, 2024 · 9 comments
Labels
bug Something isn't working

Comments

@jsalgado78
Copy link

jsalgado78 commented Dec 11, 2024

Before creating an issue, make sure you've checked the following:

  • You are running the latest released version of k0s
  • Make sure you've searched for existing issues, both open and closed
  • Make sure you've searched for PRs too, a fix might've been merged already
  • You're looking at docs for the released version, "main" branch docs are usually ahead of released versions.

Platform

Linux 4.18.0-553.22.1.el8_10.x86_64 #1 SMP Wed Sep 11 18:02:00 EDT 2024 x86_64 GNU/Linux
NAME="Red Hat Enterprise Linux"
VERSION="8.10 (Ootpa)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="8.10"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.10 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8::baseos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8"
BUG_REPORT_URL="https://issues.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_BUGZILLA_PRODUCT_VERSION=8.10
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.10"

Version

v1.31.2+k0s.0

Sysinfo

`k0s sysinfo`
Total memory: 7.7 GiB (pass)
File system of /var/lib/k0s: xfs (pass)
Disk space available for /var/lib/k0s: 3.6 GiB (pass)
Relative disk space available for /var/lib/k0s: 30% (pass)
Name resolution: localhost: [127.0.0.1 ::1] (pass)
Operating system: Linux (pass)
  Linux kernel release: 4.18.0-553.22.1.el8_10.x86_64 (pass)
  Max. file descriptors per process: current: 65536 / max: 65536 (pass)
  AppArmor: unavailable (pass)
  Executable in PATH: modprobe: /sbin/modprobe (pass)
  Executable in PATH: mount: /usr/bin/mount (pass)
  Executable in PATH: umount: /usr/bin/umount (pass)
  /proc file system: mounted (0x9fa0) (pass)
  Control Groups: version 1 (pass)
    cgroup controller "cpu": available (pass)
    cgroup controller "cpuacct": available (pass)
    cgroup controller "cpuset": available (pass)
    cgroup controller "memory": available (pass)
    cgroup controller "devices": available (pass)
    cgroup controller "freezer": available (pass)
    cgroup controller "pids": available (pass)
    cgroup controller "hugetlb": available (pass)
    cgroup controller "blkio": available (pass)
  CONFIG_CGROUPS: Control Group support: built-in (pass)
    CONFIG_CGROUP_FREEZER: Freezer cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_PIDS: PIDs cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_DEVICE: Device controller for cgroups: built-in (pass)
    CONFIG_CPUSETS: Cpuset support: built-in (pass)
    CONFIG_CGROUP_CPUACCT: Simple CPU accounting cgroup subsystem: built-in (pass)
    CONFIG_MEMCG: Memory Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_HUGETLB: HugeTLB Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_SCHED: Group CPU scheduler: built-in (pass)
      CONFIG_FAIR_GROUP_SCHED: Group scheduling for SCHED_OTHER: built-in (pass)
        CONFIG_CFS_BANDWIDTH: CPU bandwidth provisioning for FAIR_GROUP_SCHED: built-in (pass)
    CONFIG_BLK_CGROUP: Block IO controller: built-in (pass)
  CONFIG_NAMESPACES: Namespaces support: built-in (pass)
    CONFIG_UTS_NS: UTS namespace: built-in (pass)
    CONFIG_IPC_NS: IPC namespace: built-in (pass)
    CONFIG_PID_NS: PID namespace: built-in (pass)
    CONFIG_NET_NS: Network namespace: built-in (pass)
  CONFIG_NET: Networking support: built-in (pass)
    CONFIG_INET: TCP/IP networking: built-in (pass)
      CONFIG_IPV6: The IPv6 protocol: built-in (pass)
    CONFIG_NETFILTER: Network packet filtering framework (Netfilter): built-in (pass)
      CONFIG_NETFILTER_ADVANCED: Advanced netfilter configuration: built-in (pass)
      CONFIG_NF_CONNTRACK: Netfilter connection tracking support: module (pass)
      CONFIG_NETFILTER_XTABLES: Netfilter Xtables support: built-in (pass)
        CONFIG_NETFILTER_XT_TARGET_REDIRECT: REDIRECT target support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_COMMENT: "comment" match support: module (pass)
        CONFIG_NETFILTER_XT_MARK: nfmark target and match support: module (pass)
        CONFIG_NETFILTER_XT_SET: set target and match support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_MASQUERADE: MASQUERADE target support: unknown (warning: also tried CONFIG_IP_NF_TARGET_MASQUERADE, CONFIG_IP6_NF_TARGET_MASQUERADE)
        CONFIG_NETFILTER_XT_NAT: "SNAT and DNAT" targets support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: "addrtype" address type match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_CONNTRACK: "conntrack" connection tracking match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_MULTIPORT: "multiport" Multiple port match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_RECENT: "recent" match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_STATISTIC: "statistic" match support: module (pass)
      CONFIG_NETFILTER_NETLINK: module (pass)
      CONFIG_NF_NAT: module (pass)
      CONFIG_IP_SET: IP set support: module (pass)
        CONFIG_IP_SET_HASH_IP: hash:ip set support: module (pass)
        CONFIG_IP_SET_HASH_NET: hash:net set support: module (pass)
      CONFIG_IP_VS: IP virtual server support: module (pass)
        CONFIG_IP_VS_NFCT: Netfilter connection tracking: built-in (pass)
        CONFIG_IP_VS_SH: Source hashing scheduling: module (pass)
        CONFIG_IP_VS_RR: Round-robin scheduling: module (pass)
        CONFIG_IP_VS_WRR: Weighted round-robin scheduling: module (pass)
      CONFIG_NF_CONNTRACK_IPV4: IPv4 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_REJECT_IPV4: IPv4 packet rejection: module (pass)
      CONFIG_NF_NAT_IPV4: IPv4 NAT: unknown (warning)
      CONFIG_IP_NF_IPTABLES: IP tables support: module (pass)
        CONFIG_IP_NF_FILTER: Packet filtering: module (pass)
          CONFIG_IP_NF_TARGET_REJECT: REJECT target support: module (pass)
        CONFIG_IP_NF_NAT: iptables NAT support: module (pass)
        CONFIG_IP_NF_MANGLE: Packet mangling: module (pass)
      CONFIG_NF_DEFRAG_IPV4: module (pass)
      CONFIG_NF_CONNTRACK_IPV6: IPv6 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_NAT_IPV6: IPv6 NAT: unknown (warning)
      CONFIG_IP6_NF_IPTABLES: IP6 tables support: module (pass)
        CONFIG_IP6_NF_FILTER: Packet filtering: module (pass)
        CONFIG_IP6_NF_MANGLE: Packet mangling: module (pass)
        CONFIG_IP6_NF_NAT: ip6tables NAT support: module (pass)
      CONFIG_NF_DEFRAG_IPV6: module (pass)
    CONFIG_BRIDGE: 802.1d Ethernet Bridging: module (pass)
      CONFIG_LLC: module (pass)
      CONFIG_STP: module (pass)
  CONFIG_EXT4_FS: The Extended 4 (ext4) filesystem: module (pass)
  CONFIG_PROC_FS: /proc file system support: built-in (pass)

What happened?

I've upgraded an on-premise K0s cluster with three controller nodes and three worker nodes from K0s v.1.30.4 to v.1.31.2 using k0sctl 0.20.0 and I get this error message on each controller node when I execute k0s kubectl:

E1211 17:33:36.392466 13798 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get "https://192.168.80.132:6443/api?timeout=32s\": Forbidden"
Unable to connect to the server: Forbidden

Steps to reproduce

  1. Upgrade K0s using k0sctl v0.20.0 from 1.30.4 to 1.31.2
  2. Running k0s kubectl get nodes command on each controller node
  3. I get this error connecting to api server: Unable to connect to the server: Forbidden

Expected behavior

k0s kubectl commands should work on each controller node using /var/lib/k0s/pki/admin.conf as kubeconfig file

Actual behavior

/var/lib/k0s/pki/admin.conf file don't use https://localhost:6443 as api server URL like previous versions and /etc/k0s/k0s.yaml file doesn't include 127.0.0.1 in sans parameter like previous versions so k0s kubectl commands fails with Forbidden error message.

If I run kubectl with my own kubeconfig previously generated all appears to be fine in this K0s cluster

Screenshots and logs

No response

Additional context

[root]# k0s kubectl get nodes
E1211 17:33:36.362183 13798 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get "https://192.168.80.132:6443/api?timeout=32s\": Forbidden"
E1211 17:33:36.368966 13798 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get "https://192.168.80.132:6443/api?timeout=32s\": Forbidden"
E1211 17:33:36.376438 13798 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get "https://192.168.80.132:6443/api?timeout=32s\": Forbidden"
E1211 17:33:36.384324 13798 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get "https://192.168.80.132:6443/api?timeout=32s\": Forbidden"
E1211 17:33:36.392466 13798 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get "https://192.168.80.132:6443/api?timeout=32s\": Forbidden"
Unable to connect to the server: Forbidden

These controller nodes have three network interfaces. In the example controller node has IPs 192.168.80.132, 192.168.60.132 and 10.230.222.132

Something is wrong in /var/lib/k0s/pki/admin.conf with K0s 1.31.2 because of I've been upgrading K0s from previous versions without issues like this:

[root]# grep server /var/lib/k0s/pki/admin.conf
server: https://192.168.80.132:6443

[root]# sed -i 's/192.168.80.132/localhost/' /var/lib/k0s/pki/admin.conf
[root]# k0s kubectl get nodes
NAME STATUS ROLES AGE VERSION
s504pre144k0s Ready 2y150d v1.31.2+k0s
s504pre145k0s Ready 2y150d v1.31.2+k0s
s504pre146k0s Ready 2y150d v1.31.2+k0s

[root]# systemctl restart k0scontroller
[root]# k0s kubectl get nodes
E1211 17:40:19.164057 14737 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get "https://192.168.80.132:6443/api?timeout=32s\": Forbidden"
E1211 17:40:19.172882 14737 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get "https://192.168.80.132:6443/api?timeout=32s\": Forbidden"
E1211 17:40:19.181674 14737 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get "https://192.168.80.132:6443/api?timeout=32s\": Forbidden"
E1211 17:40:19.189815 14737 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get "https://192.168.80.132:6443/api?timeout=32s\": Forbidden"
E1211 17:40:19.197323 14737 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get "https://192.168.80.132:6443/api?timeout=32s\": Forbidden"
Unable to connect to the server: Forbidden

[root]# grep server /var/lib/k0s/pki/admin.conf
server: https://192.168.80.132:6443

beforeupgrade-k0s.yaml.log
afterupgrade-k0s.yaml.log

@jsalgado78 jsalgado78 added the bug Something isn't working label Dec 11, 2024
@juanluisvaladas
Copy link
Contributor

Hi @jsalgado78 the only explanatioon I can imagine for this is that there are multiple CAs in different control plane nodes.

From the node that you gathered this outputs from, please gather the output of:
1- openssl s_client -connect 192.168.80.132:6443 </dev/null | openssl x509 -text
2- openssl s_client -connect 127.0.0.1:6443 </dev/null | openssl x509 -text

We also need to know the list of IP addresses of each controller and from EACH controller run: k0s kc get node --server https://<ip>:6443 and try it with every ip of every node. We don't really need the outputs, we just need to know what succeeds and what doesn't.

Additionally I'd like from each controller node the following outputs:
1- stat /var/lib/k0s/pki/ca.crt /var/lib/k0s/pki/k0s-api.crt
2- md5sum /var/lib/k0s/pki/ca.crt
3- openssl verify -CAfile /var/lib/k0s/pki/ca.crt /var/lib/k0s/pki/k0s-api.crt

Finally I need to know if you're using any apiserver extraArgs (this is specified in .spec.api.extraArgs). If you need to. redact some private data, redact the value, but we need to know if a flag is defined.

@jsalgado78
Copy link
Author

Hi @juanluisvaladas

Controller 1 (hostname s504pre132k0s) IPs: 192.168.60.132, 10.230.222.132, 192.168.80.132
Controller 2 (hostname s504pre133k0s) IPs: 192.168.60.133, 10.230.222.133, 192.168.80.133
Controller 3 (hostname s504pre134k0s) IPs: 192.168.60.134, 10.230.222.134, 192.168.80.134
I use these extraArgs when install / upgrade with k0sctl (I've attached config-airgap.yml file used to upgrade, renamed as config-airgap.txt):

      extraArgs:
        service-node-port-range: 32768-35535

config-airgap.txt

I've attached a log file with output on each controller server.

controller_s504pre134k0s.log
controller_s504pre133k0s.log
controller_s504pre132k0s.log

Thanks

Hi @jsalgado78 the only explanatioon I can imagine for this is that there are multiple CAs in different control plane nodes.

From the node that you gathered this outputs from, please gather the output of: 1- openssl s_client -connect 192.168.80.132:6443 </dev/null | openssl x509 -text 2- openssl s_client -connect 127.0.0.1:6443 </dev/null | openssl x509 -text

We also need to know the list of IP addresses of each controller and from EACH controller run: k0s kc get node --server https://<ip>:6443 and try it with every ip of every node. We don't really need the outputs, we just need to know what succeeds and what doesn't.

Additionally I'd like from each controller node the following outputs: 1- stat /var/lib/k0s/pki/ca.crt /var/lib/k0s/pki/k0s-api.crt 2- md5sum /var/lib/k0s/pki/ca.crt 3- openssl verify -CAfile /var/lib/k0s/pki/ca.crt /var/lib/k0s/pki/k0s-api.crt

Finally I need to know if you're using any apiserver extraArgs (this is specified in .spec.api.extraArgs). If you need to. redact some private data, redact the value, but we need to know if a flag is defined.

@juanluisvaladas
Copy link
Contributor

Hi @jsalgado78
My first guess was wrong. Can you please acquire from one controller:
1- The k0s configuration file, usually /etc/k0s/k0s.yaml. Again feel free to redact sensitive data.
2- ps aux | grep kube-apiserver
Just one controller is enough, no need for all 3 of them

@jsalgado78
Copy link
Author

Hi @juanluisvaladas

This is /etc/k0s/k0s.yaml file on controller1 (s504pre132k0s) renamed to .txt because of GitHub restrictions.
k0s.yaml.txt

[root@s504pre132k0s /etc/k0s]# ps aux | grep kube-apiserver
kube-ap+ 1661 0.9 4.1 1521100 333496 ? Sl dic12 13:39 /var/lib/k0s/bin/kube-apiserver --authorization-mode=Node,RBAC --service-account-jwks-uri=https://kubernetes.default.svc/openid/v1/jwks --requestheader-client-ca-file=/var/lib/k0s/pki/front-proxy-ca.crt --service-cluster-ip-range=10.96.0.0/12 --api-audiences=https://kubernetes.default.svc,system:konnectivity-server --anonymous-auth=false --service-account-issuer=https://kubernetes.default.svc --tls-cert-file=/var/lib/k0s/pki/server.crt --feature-gates= --advertise-address=192.168.80.132 --v=1 --secure-port=6443 --proxy-client-cert-file=/var/lib/k0s/pki/front-proxy-client.crt --requestheader-username-headers=X-Remote-User --endpoint-reconciler-type=none --proxy-client-key-file=/var/lib/k0s/pki/front-proxy-client.key --requestheader-extra-headers-prefix=X-Remote-Extra- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --tls-min-version=VersionTLS12 --profiling=false --enable-admission-plugins=NodeRestriction --client-ca-file=/var/lib/k0s/pki/ca.crt --kubelet-client-certificate=/var/lib/k0s/pki/apiserver-kubelet-client.crt --requestheader-allowed-names=front-proxy-client --service-account-key-file=/var/lib/k0s/pki/sa.pub --allow-privileged=true --requestheader-group-headers=X-Remote-Group --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256 --service-account-signing-key-file=/var/lib/k0s/pki/sa.key --kubelet-certificate-authority=/var/lib/k0s/pki/ca.crt --egress-selector-config-file=/var/lib/k0s/konnectivity.conf --service-node-port-range=32768-35535 --enable-bootstrap-token-auth=true --kubelet-client-key=/var/lib/k0s/pki/apiserver-kubelet-client.key --tls-private-key-file=/var/lib/k0s/pki/server.key --etcd-servers=https://127.0.0.1:2379 --etcd-cafile=/var/lib/k0s/pki/etcd/ca.crt --etcd-certfile=/var/lib/k0s/pki/apiserver-etcd-client.crt --etcd-keyfile=/var/lib/k0s/pki/apiserver-etcd-client.key

Thanks

Hi @jsalgado78 My first guess was wrong. Can you please acquire from one controller: 1- The k0s configuration file, usually /etc/k0s/k0s.yaml. Again feel free to redact sensitive data. 2- ps aux | grep kube-apiserver Just one controller is enough, no need for all 3 of them

@juanluisvaladas
Copy link
Contributor

juanluisvaladas commented Dec 16, 2024

Hi @jsalgado78,
I discussed this with the team today and we think the next things to check are:

1- We want to see how /var/lib/k0s/pki/admin.conflooks like. We're particularly interested in the client-certificate-data fied. IMPORTANT: Make sure to remove the field client-key-data. This is the private key of the kubeconfig and certainly something you don't want exposed.

2- We believe there might be some object in kubernetes causing this behavior, for instance a validating webhook, please acquire: kubectl get mutatingwebhookconfigurations,validatingadmissionpolicies,validatingadmissionpolicybindings,validatingwebhookconfigurations -o yaml

3- We released 1.31.3 on Saturday, which includes a new kube-apiserver, if you're able to update to that we think it's worth giving it a shot just in case...

@jsalgado78
Copy link
Author

Hi @juanluisvaladas

This is admin.conf file.
admin.conf.txt

This is kubectl get output
kubectl_get.log

I've probe K0s 1.31.3 with same results:


[root@s504pre132k0s ~]# k0s status
Version: v1.31.3+k0s.0
Process ID: 1027504
Role: controller
Workloads: false
SingleNode: false

[root@s504pre132k0s ~]# grep server /var/lib/k0s/pki/admin.conf
    server: https://192.168.80.132:6443

[root@s504pre132k0s ~]# k0s kubectl get nodes
E1217 09:57:10.917456    1965 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://192.168.80.132:6443/api?timeout=32s\": Forbidden"
E1217 09:57:10.935703    1965 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://192.168.80.132:6443/api?timeout=32s\": Forbidden"
E1217 09:57:10.973861    1965 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://192.168.80.132:6443/api?timeout=32s\": Forbidden"
E1217 09:57:10.988616    1965 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://192.168.80.132:6443/api?timeout=32s\": Forbidden"
E1217 09:57:11.000523    1965 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://192.168.80.132:6443/api?timeout=32s\": Forbidden"
Unable to connect to the server: Forbidden


Thanks.

@juanluisvaladas
Copy link
Contributor

Hi @jsalgado78,
I still don't see anything odd here. I'd like to see the apiserver logs.

Would it be possible for you to gather the output of:
1- k0s kc get node --server https://:6443 -V8
2- Immediately after gather the logs of the k0scontroller around that time

@jsalgado78
Copy link
Author

Hi @juanluisvaladas

These are output log files for each ip on controller s504pre132k0s:
output_127.0.0.1.log
output_192.168.60.132.log
output_192.168.80.132.log

Thanks.

@juanluisvaladas
Copy link
Contributor

Hi @jsalgado78,
Could you attach the k0scontroller logs of 192.168.80.132 betweeen 2024-12-20 12:26:16 and 2024-12-20 12:26:20?
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants