controller: fix frequent sidecar restarts #12

sauterp · 2024-02-16T15:54:55Z

Description

Containers in the controller pod would frequently restart

❯ kubectl -n kube-system get pods | grep exoscale-csi
exoscale-csi-controller-67c9d7ddb-2hgtv    7/7     Running   35 (89m ago)    6d19h
exoscale-csi-controller-67c9d7ddb-78744    7/7     Running   41 (5h6m ago)   6d19h
exoscale-csi-node-fvr62                    3/3     Running   0               6d19h
exoscale-csi-node-lsx6b                    3/3     Running   0               6d19h

Upon closer inspection we found that only sidecar containers would restart with the same errors at roughly the same time:

I0215 11:52:04.337536       1 leaderelection.go:281] successfully renewed lease kube-system/external-resizer-csi-exoscale-com
E0215 11:52:16.353119       1 leaderelection.go:369] Failed to update lock: etcdserver: request timed out
I0215 11:52:19.341615       1 leaderelection.go:285] failed to renew lease kube-system/external-resizer-csi-exoscale-com: timed out waiting for the condition
F0215 11:52:19.341778       1 leader_election.go:182] stopped leading

Judging by apiserver logs it looks like etcd throttled the requests because they were too frequent, which led the sidecars to restart. To fix this we double the lease durations, renew deadlines and retry periods for the leader elections of all sidecars.
We also add some missing RBAC rules.

Checklist

(For exoscale contributors)

Changelog updated (under Unreleased block)
Integration tests OK

Testing

The sidecars have been running for more than two days without restarting, which was the case previously.

❯ kubectl -n kube-system get pods | grep exoscale-csi
exoscale-csi-controller-6d55bf4879-4wjl7   7/7     Running   0          2d17h
exoscale-csi-controller-6d55bf4879-tvb9t   7/7     Running   0          2d17h
exoscale-csi-node-fvr62                    3/3     Running   0          10d
exoscale-csi-node-lsx6b                    3/3     Running   0          10d

I also checked the logs of all the sidecars and couldn't find anymore errors.

This reverts commit bceb463.

shortcut-integration · 2024-02-16T15:54:59Z

This pull request has been linked to Shortcut Story #86085: GH Issue (exoscale-csi-driver): [Bug]: Controller crashing / restarting.

pierre-emmanuelJ

👍

…-driver-bug-controller

sauterp added 4 commits February 16, 2024 16:39

first fix

d3d283e

remove leases RBAC

bceb463

Revert "remove leases RBAC"

1864cf2

This reverts commit bceb463.

2x election defaults instead of 4x

274daf3

changelog

837d88c

sauterp marked this pull request as ready for review February 19, 2024 10:04

pierre-emmanuelJ approved these changes Feb 19, 2024

View reviewed changes

PhilippeChepy approved these changes Feb 19, 2024

View reviewed changes

sauterp and others added 3 commits February 20, 2024 17:48

wait for CSI to become ready

56bbe4c

wait for CSI

90c77b7

Merge branch 'main' into philippsauter/sc-86085/gh-issue-exoscale-csi…

89f6441

…-driver-bug-controller

sauterp merged commit 78eaf5e into main Feb 20, 2024
1 check failed

sauterp deleted the philippsauter/sc-86085/gh-issue-exoscale-csi-driver-bug-controller branch February 20, 2024 17:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

controller: fix frequent sidecar restarts #12

controller: fix frequent sidecar restarts #12

sauterp commented Feb 16, 2024 •

edited

Loading

shortcut-integration bot commented Feb 16, 2024

pierre-emmanuelJ left a comment

controller: fix frequent sidecar restarts #12

controller: fix frequent sidecar restarts #12

Conversation

sauterp commented Feb 16, 2024 • edited Loading

Description

Checklist

Testing

shortcut-integration bot commented Feb 16, 2024

pierre-emmanuelJ left a comment

Choose a reason for hiding this comment

sauterp commented Feb 16, 2024 •

edited

Loading