Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change detection issues for the --enable-worker k0s controller install flag #801

Closed
NeonSludge opened this issue Nov 29, 2024 · 5 comments
Closed

Comments

@NeonSludge
Copy link

k0sctl 0.19.4 appears to incorrectly detect changes in the --enable-worker k0s controller install flag value which leads to unexpected controller restarts in the reinstall phase.

Here is an example of what that looks in k0sctl's debug logs:

[ssh] <controller IP 1>:22: installFlags seem to have changed. existing: map[--config:/etc/k0s/k0s.yaml --cri-socket:remote:/var/run/containerd/containerd.sock --data-dir:/var/lib/k0s --disable-components:konnectivity-server,metrics-server --enable-worker:true --iptables-mode:nft --kubelet-extra-args:--pod-infra-container-image=k8s.cache/pause:3.9 --node-ip=<controller IP 1> --hostname-override=<controller hostname 1> --profile:custom] new: map[--config:/etc/k0s/k0s.yaml --cri-socket:remote:/var/run/containerd/containerd.sock --data-dir:/var/lib/k0s --disable-components:konnectivity-server,metrics-server --enable-worker: --iptables-mode:nft --kubelet-extra-args:--pod-infra-container-image=k8s.cache/pause:3.9 --node-ip=<controller IP 1> --hostname-override=<controller hostname 1> --profile:custom]
[ssh] <controller IP 2>:22: installFlags seem to have changed. existing: map[--config:/etc/k0s/k0s.yaml --cri-socket:remote:/var/run/containerd/containerd.sock --data-dir:/var/lib/k0s --disable-components:konnectivity-server,metrics-server --enable-worker:true --iptables-mode:nft --kubelet-extra-args:--pod-infra-container-image=k8s.cache/pause:3.9 --node-ip=<controller IP 2> --hostname-override=<controller hostname 2> --profile:custom --token-file:/etc/k0s/k0stoken] new: map[--config:/etc/k0s/k0s.yaml --cri-socket:remote:/var/run/containerd/containerd.sock --data-dir:/var/lib/k0s --disable-components:konnectivity-server,metrics-server --enable-worker: --iptables-mode:nft --kubelet-extra-args:--pod-infra-container-image=k8s.cache/pause:3.9 --node-ip=<controller IP 2> --hostname-override=<controller hostname 2> --profile:custom --token-file:/etc/k0s/k0stoken]
[ssh] <controller IP 3>:22: installFlags seem to have changed. existing: map[--config:/etc/k0s/k0s.yaml --cri-socket:remote:/var/run/containerd/containerd.sock --data-dir:/var/lib/k0s --disable-components:konnectivity-server,metrics-server --enable-worker:true --iptables-mode:nft --kubelet-extra-args:--pod-infra-container-image=k8s.cache/pause:3.9 --node-ip=<controller IP 3> --hostname-override=<controller hostname 3> --profile:custom --token-file:/etc/k0s/k0stoken] new: map[--config:/etc/k0s/k0s.yaml --cri-socket:remote:/var/run/containerd/containerd.sock --data-dir:/var/lib/k0s --disable-components:konnectivity-server,metrics-server --enable-worker: --iptables-mode:nft --kubelet-extra-args:--pod-infra-container-image=k8s.cache/pause:3.9 --node-ip=<controller IP 3> --hostname-override=<controller hostname 3> --profile:custom --token-file:/etc/k0s/k0stoken]

Notice that k0sctl detects changes in the --enable-worker flag value (--enable-worker:true --> --enable-worker:). This change by itself would not be significant, but the problem is that the k0s install controller command seems to always produce a flag value when rendering flags (even for boolean flags):

  1. The k0s install controller command prepares controller command-line arguments (the cmdFlagsToArgs function):
    https://github.com/k0sproject/k0s/blob/f74eefd01bad2764758bf57450912b0e1de68993/cmd/install/controller.go#L28-L67
    https://github.com/k0sproject/k0s/blob/f74eefd01bad2764758bf57450912b0e1de68993/cmd/install/util.go#L30-L53
  2. Which then get passed around and used in the ExecStart directive in the controller systemd unit file:
    https://github.com/k0sproject/k0s/blob/f74eefd01bad2764758bf57450912b0e1de68993/cmd/install/install.go#L56-L78
    https://github.com/k0sproject/k0s/blob/f74eefd01bad2764758bf57450912b0e1de68993/pkg/install/service.go#L69-L130

This can be confirmed by examining the generated systemd unit file and the k0s status output on one of the controllers:

ExecStart=/usr/local/bin/k0s controller --config=/etc/k0s/k0s.yaml --cri-socket=remote:/var/run/containerd/containerd.sock --data-dir=/var/lib/k0s --disable-components=konnectivity-server,metrics-server --enable-worker=true --iptables-mode=nft --kubelet-extra-args=--pod-infra-container-image=k8s.cache/pause:3.9\x20--node-ip=<...>\x20--hostname-override=<...>--profile=custom
# k0s status -o json | jq .Args
[
  "/usr/local/bin/k0s",
  "controller",
  "--config=/etc/k0s/k0s.yaml",
  "--cri-socket=remote:/var/run/containerd/containerd.sock",
  "--data-dir=/var/lib/k0s",
  "--disable-components=konnectivity-server,metrics-server",
  "--enable-worker=true",
  "--iptables-mode=nft",
  "--kubelet-extra-args=--pod-infra-container-image=k8s.cache/pause:3.9 --node-ip=<...> --hostname-override=<...>",
  "--profile=custom"
]

Meanwhile, k0sctl uses this flag without a value here:

func (h *Host) K0sInstallFlags() (Flags, error) {
flags := Flags(h.InstallFlags)
flags.AddOrReplace(fmt.Sprintf("--data-dir=%s", shellescape.Quote(h.K0sDataDir())))
switch h.Role {
case "controller+worker":
flags.AddUnlessExist("--enable-worker")
if h.NoTaints {
flags.AddUnlessExist("--no-taints")
}

If I'm understanding the change detection logic correctly, all of this means that this change will get detected every time k0sctl applies a cluster manifest.

@karlivory
Copy link

karlivory commented Nov 30, 2024

I'm encountering the same issue. Running k0sctl apply with

apiVersion: k0sctl.k0sproject.io/v1beta1
kind: Cluster
metadata:
  name: k8s-dev
spec:
  k0s:
    version: v1.31.2+k0s.0
  hosts:
  - ssh:
      address: <omitted>
      user: root
      port: 22
    role: controller+worker
    noTaints: true

triggers a reinstall on the node every time. And running with --debug I also see the relevant problematic log line:

DEBU[0002] [ssh] <omitted>:22: installFlags seem to have changed. existing: map[--config:/etc/k0s/k0s.yaml --data-dir:/var/lib/k0s --enable-worker:true --kubelet-extra-args:--node-ip=10.166.0.4 --no-taints:true] new: map[--config:/etc/k0s/k0s.yaml --data-dir:/var/lib/k0s --enable-worker: --kubelet-extra-args:--node-ip=10.166.0.4 --no-taints:]

Maybe there should be an idempotency test to catch these issues?

@artem-zinnatullin
Copy link

I just lost all cluster workers likely due to this issue combined with #722 on a simple cluster version upgrade.

This is so far from "zero friction k8s" experience, honestly I've never seen issues in large production k8s installations I have to maintain anywhere near to what I see with k0s in my small 3 node install for smart home, everytime I touch anything in k0sctl.yaml I'm in for 1-2 evenings of crazy debugging. I know it's open source and I better contribute than complaint, but the reality is still real and my small PRs won't change philosophy behind docs, k0s and k0sctl tools.

@kke
Copy link
Contributor

kke commented Dec 2, 2024

Additional problems can be anticipated because of the path-absolutization and dropping --env and --force in

https://github.com/k0sproject/k0s/blob/f74eefd01bad2764758bf57450912b0e1de68993/cmd/install/util.go#L30-L53

I have taken this into consideration in #803

@kke
Copy link
Contributor

kke commented Dec 9, 2024

v0.20.0 now includes potential improvements for this.

@NeonSludge
Copy link
Author

I have tested the v0.20.0 version of k0sctl and can confirm that it no longer detects bogus changes in the --enable-worker flag. Thank you for the fix!

@kke kke closed this as completed Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants