Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

controller: fix frequent sidecar restarts #12

Conversation

sauterp
Copy link
Member

@sauterp sauterp commented Feb 16, 2024

Description

Containers in the controller pod would frequently restart

❯ kubectl -n kube-system get pods | grep exoscale-csi
exoscale-csi-controller-67c9d7ddb-2hgtv    7/7     Running   35 (89m ago)    6d19h
exoscale-csi-controller-67c9d7ddb-78744    7/7     Running   41 (5h6m ago)   6d19h
exoscale-csi-node-fvr62                    3/3     Running   0               6d19h
exoscale-csi-node-lsx6b                    3/3     Running   0               6d19h

Upon closer inspection we found that only sidecar containers would restart with the same errors at roughly the same time:

I0215 11:52:04.337536       1 leaderelection.go:281] successfully renewed lease kube-system/external-resizer-csi-exoscale-com
E0215 11:52:16.353119       1 leaderelection.go:369] Failed to update lock: etcdserver: request timed out
I0215 11:52:19.341615       1 leaderelection.go:285] failed to renew lease kube-system/external-resizer-csi-exoscale-com: timed out waiting for the condition
F0215 11:52:19.341778       1 leader_election.go:182] stopped leading

Judging by apiserver logs it looks like etcd throttled the requests because they were too frequent, which led the sidecars to restart. To fix this we double the lease durations, renew deadlines and retry periods for the leader elections of all sidecars.
We also add some missing RBAC rules.

Checklist

(For exoscale contributors)

  • Changelog updated (under Unreleased block)
  • Integration tests OK

Testing

The sidecars have been running for more than two days without restarting, which was the case previously.

❯ kubectl -n kube-system get pods | grep exoscale-csi
exoscale-csi-controller-6d55bf4879-4wjl7   7/7     Running   0          2d17h
exoscale-csi-controller-6d55bf4879-tvb9t   7/7     Running   0          2d17h
exoscale-csi-node-fvr62                    3/3     Running   0          10d
exoscale-csi-node-lsx6b                    3/3     Running   0          10d

I also checked the logs of all the sidecars and couldn't find anymore errors.

Copy link

@sauterp sauterp marked this pull request as ready for review February 19, 2024 10:04
Copy link
Member

@pierre-emmanuelJ pierre-emmanuelJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@sauterp sauterp merged commit 78eaf5e into main Feb 20, 2024
1 check failed
@sauterp sauterp deleted the philippsauter/sc-86085/gh-issue-exoscale-csi-driver-bug-controller branch February 20, 2024 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants