You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As shown in the conversations https://linkedin-randd.slack.com/archives/C04FMP0HB17/p1671222219329569,
if a new monitoring topic is just created in a cluster, the AdminClient.describeTopic API could result in UnknownTopicOrPartitionExceptions, which causes the whole process to crash. Below are the places that can trigger the exception (and there maybe more call sites)
As shown in the conversations https://linkedin-randd.slack.com/archives/C04FMP0HB17/p1671222219329569,
if a new monitoring topic is just created in a cluster, the AdminClient.describeTopic API could result in UnknownTopicOrPartitionExceptions, which causes the whole process to crash. Below are the places that can trigger the exception (and there maybe more call sites)
kafka-monitor/src/main/java/com/linkedin/xinfra/monitor/services/MultiClusterTopicManagementService.java
Line 455 in 7f99c09
kafka-monitor/src/main/java/com/linkedin/xinfra/monitor/services/MultiClusterTopicManagementService.java
Line 338 in 7f99c09
We need to make sure that the logic calling the describeTopic API has appropriate retries and backoffs in case it's a topic that's just created.
The text was updated successfully, but these errors were encountered: