Dynamic Cluster Distribution bug:can not get shard number while restart app-controller pod and shard-cm is existed #20965
Labels
bug
Something isn't working
component:application-controller
component:sharding
version:2.13
Latest confirmed affected version is 2.13
Describe the bug
When I enable Dynamic Cluster Distribution feature, I have found a serious problem: application-controller would not process applications.
After I enabled the Dynamic Cluster Distribution feature, I started 3 application-controller(Deployment) replicas., and the shard configmap
argocd-app-controller-shard-cm
has been created. At this time, I restarted any replica, the new application-controller pod would not process applications, and application is stuck in refreshing. No errors.The root cause is : If I restart one application-controller pod, the new created application-controller pod
default shard number is -1, and the heartbeat of the shard corresponding to the old pod in the configmap
argocd-app-controller-shard-cm
did not time out, so the shard number obtained bygetOrUpdateShardNumberForController
was still -1.-1 will be set to
clusterSharding.Shard
, and will not enqueue any application add/update/delete event.And the
clusterSharding.Shard
was not updated by ReadinessHealthCheck, ReadinessHealthCheck only update the configmapargocd-app-controller-shard-cm
.To Reproduce
Expected behavior
After enabling the Dynamic Cluster Distribution feature, each application-controller shard can process applications normally.
Version
v2.13、v2.12、...
Logs
I get the following debug logs:
The text was updated successfully, but these errors were encountered: