-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement the cross-zone-based stoppable check #2680
Implement the cross-zone-based stoppable check #2680
Conversation
…tances parameter to the Stoppable API
helix-core/src/main/java/org/apache/helix/util/InstanceValidationUtil.java
Outdated
Show resolved
Hide resolved
helix-rest/src/main/java/org/apache/helix/rest/server/resources/helix/InstancesAccessor.java
Show resolved
Hide resolved
helix-rest/src/main/java/org/apache/helix/rest/server/resources/helix/InstancesAccessor.java
Show resolved
Hide resolved
...rc/main/java/org/apache/helix/rest/clusterMaintenanceService/StoppableInstancesSelector.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments on backwards incompatible changes, design of selector, and another thing to consider for custom partition health check.
You can change public method signatures in the selector since it isn't merged to master yet, but we should be careful about doing that in MaintenanceManagementService
helix-core/src/main/java/org/apache/helix/util/InstanceValidationUtil.java
Show resolved
Hide resolved
helix-core/src/test/java/org/apache/helix/util/TestInstanceValidationUtil.java
Outdated
Show resolved
Hide resolved
@@ -745,7 +745,7 @@ protected Map<String, Boolean> getInstanceHealthStatus(String clusterId, String | |||
break; | |||
case MIN_ACTIVE_REPLICA_CHECK_FAILED: | |||
healthStatus.put(HealthCheck.MIN_ACTIVE_REPLICA_CHECK_FAILED.name(), | |||
InstanceValidationUtil.siblingNodesActiveReplicaCheck(_dataAccessor, instanceName)); | |||
InstanceValidationUtil.siblingNodesActiveReplicaCheck(_dataAccessor, instanceName, toBeStoppedInstances)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not necessarily a part of the change, but I think we should re-evaluate whether the EMPTY_RESOURCE_ASSIGNMENT
should keep an instance from being stoppable. There could be legitimate reason there is no resource assignment. Why should that block a deployment or other operation?
.../main/java/org/apache/helix/rest/clusterMaintenanceService/MaintenanceManagementService.java
Show resolved
Hide resolved
...rc/main/java/org/apache/helix/rest/clusterMaintenanceService/StoppableInstancesSelector.java
Outdated
Show resolved
Hide resolved
…ck to bypass the unhealth partitions on the to_be_stopped_instances
@@ -114,7 +114,7 @@ public MockMaintenanceManagementService(ZKHelixDataAccessor dataAccessor, | |||
|
|||
@Override | |||
protected Map<String, Boolean> getInstanceHealthStatus(String clusterId, String instanceName, | |||
List<HealthCheck> healthChecks) { | |||
List<HealthCheck> healthChecks, Set<String> toBeStoppedInstances) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why we still need to change this override is the getInstanceStoppableCheck
will eventually call protected Map<String, Boolean> getInstanceHealthStatus(String clusterId, String instanceName, List<HealthCheck> healthChecks, Set<String> toBeStoppedInstances)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, LGTM. One minor comment
...rc/main/java/org/apache/helix/rest/clusterMaintenanceService/StoppableInstancesSelector.java
Outdated
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/util/InstanceValidationUtil.java
Outdated
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/util/InstanceValidationUtil.java
Outdated
Show resolved
Hide resolved
Thanks @junkaixue @xyuanlu @zpinto for reviewing the PR. It's approved by @xyuanlu @zpinto and is ready to merge. |
Implement the cross-zone-based stoppable check and add to_be_stopped_instances query parameter to the stoppable check API
Implement the cross-zone-based stoppable check and add to_be_stopped_instances query parameter to the stoppable check API
Implement the cross-zone-based stoppable check and add to_be_stopped_instances query parameter to the stoppable check API
Implement the cross-zone-based stoppable check and add to_be_stopped_instances query parameter to the stoppable check API
Issues
Cross-zone Stoppable Check #2655
Description
to_be_stopped_instances
. There are two checks that will be affected:siblingNodesActiveReplicaCheck
perPartitionHealthCheck
Tests
mvn test -Dtest=TestInstancesAccessor,TestMaintenanceManagementService,TestInstanceValidationUtilInRest,TestPerInstanceAccessor -pl helix-rest && mvn test -Dtest=TestInstanceValidationUtil -pl helix-core
(List the names of added unit/integration tests)
(If CI test fails due to known issue, please specify the issue and test PR locally. Then copy & paste the result of "mvn test" to here.)
Changes that Break Backward Compatibility (Optional)
(Consider including all behavior changes for public methods or API. Also include these changes in merge description so that other developers are aware of these changes. This allows them to make relevant code changes in feature branches accounting for the new method/API behavior.)
Documentation (Optional)
(Link the GitHub wiki you added)
Commits
Code Quality
(helix-style-intellij.xml if IntelliJ IDE is used)