Ensure proper retry and backoff for newly created monitor topics #390

gitlw · 2022-12-19T19:18:44Z

As shown in the conversations https://linkedin-randd.slack.com/archives/C04FMP0HB17/p1671222219329569,
if a new monitoring topic is just created in a cluster, the AdminClient.describeTopic API could result in UnknownTopicOrPartitionExceptions, which causes the whole process to crash. Below are the places that can trigger the exception (and there maybe more call sites)

kafka-monitor/src/main/java/com/linkedin/xinfra/monitor/services/MultiClusterTopicManagementService.java

Line 455 in 7f99c09

    
           _adminClient.describeTopics(Collections.singleton(_topic)).all().get().get(_topic).partitions();

kafka-monitor/src/main/java/com/linkedin/xinfra/monitor/services/MultiClusterTopicManagementService.java

Line 338 in 7f99c09

    
           List<TopicPartitionInfo> partitions = topicDescriptions.get(_requestTimeout.toMillis(), TimeUnit.MILLISECONDS).partitions();

We need to make sure that the logic calling the describeTopic API has appropriate retries and backoffs in case it's a topic that's just created.

github-actions · 2022-12-19T19:19:17Z

This is your first issue in the repository. Thank you for raising this issue.' first issue

gitlw assigned mhratson Dec 19, 2022

mhratson mentioned this issue Jan 17, 2023

Prevent crashes by ensuring topics are created #392

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure proper retry and backoff for newly created monitor topics #390

Ensure proper retry and backoff for newly created monitor topics #390

gitlw commented Dec 19, 2022

github-actions bot commented Dec 19, 2022

Ensure proper retry and backoff for newly created monitor topics #390

Ensure proper retry and backoff for newly created monitor topics #390

Comments

gitlw commented Dec 19, 2022

github-actions bot commented Dec 19, 2022