controller: fix frequent sidecar restarts #12
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Containers in the controller pod would frequently restart
❯ kubectl -n kube-system get pods | grep exoscale-csi exoscale-csi-controller-67c9d7ddb-2hgtv 7/7 Running 35 (89m ago) 6d19h exoscale-csi-controller-67c9d7ddb-78744 7/7 Running 41 (5h6m ago) 6d19h exoscale-csi-node-fvr62 3/3 Running 0 6d19h exoscale-csi-node-lsx6b 3/3 Running 0 6d19h
Upon closer inspection we found that only sidecar containers would restart with the same errors at roughly the same time:
Judging by apiserver logs it looks like etcd throttled the requests because they were too frequent, which led the sidecars to restart. To fix this we double the lease durations, renew deadlines and retry periods for the leader elections of all sidecars.
We also add some missing RBAC rules.
Checklist
(For exoscale contributors)
Testing
The sidecars have been running for more than two days without restarting, which was the case previously.
❯ kubectl -n kube-system get pods | grep exoscale-csi exoscale-csi-controller-6d55bf4879-4wjl7 7/7 Running 0 2d17h exoscale-csi-controller-6d55bf4879-tvb9t 7/7 Running 0 2d17h exoscale-csi-node-fvr62 3/3 Running 0 10d exoscale-csi-node-lsx6b 3/3 Running 0 10d
I also checked the logs of all the sidecars and couldn't find anymore errors.