Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add and distribute IQv2 information in KIP-1071 #18278

Open
wants to merge 4 commits into
base: kip1071
Choose a base branch
from

Conversation

bbejeck
Copy link
Member

@bbejeck bbejeck commented Dec 19, 2024

This PR adds IQv2 information to the StreamsGroupHeartbeat response.

For testing, all IQv2 related integration tests have been updated (and are passing) to use the new streams protocol to exercise the newly added code. A specific integration test and unit tests are coming.

Among the tests updated:

  1. ConsistencyVectorIntegrationTest
  2. EosIntegrationTest
  3. IQv2IntegrationTest
  4. IQv2StoreIntegrationTest
  5. IQv2VersionedStoreIntegrationTest
  6. LagFetchIntegrationTest
  7. OptimizedKTableIntegrationTest
  8. PositionRestartIntegrationTest
  9. QueryableStateIntegrationTest
  10. StoreQueryIntegrationTest

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@@ -110,6 +110,7 @@
import org.apache.kafka.image.MetadataImage;
import org.apache.kafka.image.MetadataProvenance;
import org.apache.kafka.server.common.MetadataVersion;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkstyle fix

@@ -18,6 +18,7 @@
package org.apache.kafka.coordinator.group.taskassignor;

import org.apache.kafka.coordinator.group.GroupCoordinatorConfig;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkstyle fix

@@ -173,7 +179,7 @@ public void afterTest() {

@AfterAll
public static void after() {
CLUSTER.stop();
cluster.stop();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checkstyle fix this applies to all CLUSTER -> cluster updates

for (Map.Entry<String, Set<Integer>> taskEntry : taskEntrySet) {
String subtopologyId = taskEntry.getKey();
List<Integer> partitions = new ArrayList<>(taskEntry.getValue());
ConfiguredSubtopology configuredSubtopology = group.configuredTopology().subtopologies().get(subtopologyId);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went with the ConfiguredSubtopology since it contains any resolved regex topics.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is correct.

@@ -2267,7 +2269,26 @@ private CoordinatorResult<StreamsGroupHeartbeatResult, CoordinatorRecord> stream
);
}

// 3. Determine the partition metadata and any internal topics if needed.
// Build the endpoint to topic partition information
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until I flush out how to only send the information to the group members that don't have it, this seems like the correct spot to build the endpointsToPartitions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the right place. I would maybe move the code to the end of this function (since everything related to "buidling the right response" is towards the of the function). Also, consider extracting the logic into a separate method, since this one is already very long.

Copy link
Member

@lucasbru lucasbru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Bill! This is on a good way. I left some comments, I think there are some important details that we need to take of in the code.

@@ -2038,7 +2043,7 @@ private static Properties streamsConfiguration(final boolean cache, final boolea
config.put(StreamsConfig.TOPOLOGY_OPTIMIZATION_CONFIG, StreamsConfig.OPTIMIZE);
config.put(StreamsConfig.APPLICATION_ID_CONFIG, "app-" + safeTestName);
config.put(StreamsConfig.APPLICATION_SERVER_CONFIG, "localhost:" + (++port));
config.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, CLUSTER.bootstrapServers());
config.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, cluster.bootstrapServers());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these integration tests don't seem to use the new protocol yet. You need to set GROUP_PROTOCOL.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointer I missed that one, I'll parameterize that as well.


@BeforeAll
public static void startCluster() throws IOException {
CLUSTER.start();
final Properties props = new Properties();
props.setProperty(GroupCoordinatorConfig.GROUP_COORDINATOR_REBALANCE_PROTOCOLS_CONFIG, "classic,consumer,streams");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you imagine this working on trunk? I guess we should parametrize the tests to test both the old and the new protocol?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll parameterize the tests. I took a shortcut to make sure everything worked.

@@ -2267,7 +2269,26 @@ private CoordinatorResult<StreamsGroupHeartbeatResult, CoordinatorRecord> stream
);
}

// 3. Determine the partition metadata and any internal topics if needed.
// Build the endpoint to topic partition information
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the right place. I would maybe move the code to the end of this function (since everything related to "buidling the right response" is towards the of the function). Also, consider extracting the logic into a separate method, since this one is already very long.

@@ -2267,7 +2269,26 @@ private CoordinatorResult<StreamsGroupHeartbeatResult, CoordinatorRecord> stream
);
}

// 3. Determine the partition metadata and any internal topics if needed.
// Build the endpoint to topic partition information
final Map<String, org.apache.kafka.coordinator.group.streams.Assignment> assignmentMap = group.targetAssignment();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be better to use the "Current assignment" instead of the target assignment here. The current assignment will more closely model what is currently available on the node. You can find the current assignment in StreamsGroupMember.assignedActiveTasks etc. pp.

I also considered deriving the topic partitions from the reported taskOffsets instead of the assignment, as this is precisely the tasks that are available on the node. However, I think in the original StreamsPartititonAssignor (see populatePartitionsByHostMaps), we seem to just take the union of active and standby tasks. I discussed this briefly with Bruno, and it seems this optimization wouldn't really be worth it, just leaving this here in case you have a different opinion.

for (Map.Entry<String, Set<Integer>> taskEntry : taskEntrySet) {
String subtopologyId = taskEntry.getKey();
List<Integer> partitions = new ArrayList<>(taskEntry.getValue());
ConfiguredSubtopology configuredSubtopology = group.configuredTopology().subtopologies().get(subtopologyId);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is correct.

List<Integer> partitions = new ArrayList<>(taskEntry.getValue());
ConfiguredSubtopology configuredSubtopology = group.configuredTopology().subtopologies().get(subtopologyId);
if (configuredSubtopology != null) {
List<StreamsGroupHeartbeatResponseData.TopicPartition> topicPartitions = Stream.concat(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a corner case that needs to be handled. If you look at PartitionGrouper in the original StreamsPartitionAssignor code, there may be copartitioning groups with different number of partitions within the same subtopology. So while the number of tasks is always equal to the maximal number of partitions in all source topics, there maybe a subset of the source topics that have a lower number of partition. The example would be a subtopology that merges topic A with 1 partition and topic B with 2 partitions. We will get tasks 0 and 1, but we need to make sure that the topic partitions for task 0 is {A_0,B_0} and the topic partitions for task 1 is {B_1} (because topic A does not have a second partition).

To handle this corner case, it's probably enough to remove all elements of partitions that exceed the number of partitions in topic.

I'd suggest we add a little class for this logic, so that we can test these corner cases independently of GroupMetadatManager.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants