-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make loosen-follower-log-revert
a runtime functionality instead of a feature
#1251
Labels
A-replication
Area: replication protocol
Comments
👋 Thanks for opening this issue! Get help or engage by:
|
The third approach seems feasible to me. This is also a scenario I've encountered, unfortunately :( |
drmingdrmer
added a commit
to drmingdrmer/openraft
that referenced
this issue
Oct 15, 2024
…ation for next detected follower log revert This method requests the RaftCore to allow to reset replication for a specific node when log revert is detected. - `allow=true`: This method instructs the RaftCore to allow the target node's log to revert to a previous state for one time. - `allow=false`: This method instructs the RaftCore to panic if the target node's log revert ### Behavior - If this node is the Leader, it will attempt to replicate logs to the target node from the beginning. - If this node is not the Leader, the request is ignored. - If the target node is not found, the request is ignored. ### Automatic Replication Reset When the `loosen-follower-log-revert` feature flag is enabled, the Leader automatically reset replication if it detects that the target node's log has reverted. This feature is primarily useful in testing environments. ### Production Considerations In production environments, state reversion is a critical issue that should not be automatically handled. However, there may be scenarios where a Follower's data is intentionally removed and needs to rejoin the cluster(without membership changes). In such cases, the Leader should reinitialize replication for that node with the following steps: - Shut down the target node. - call [`Self::allow_next_revert`] on the Leader. - Clear the target node's data directory. - Restart the target node. - Fix: databendlabs#1251
drmingdrmer
added a commit
to drmingdrmer/openraft
that referenced
this issue
Oct 15, 2024
…ation for next detected follower log revert This method requests the RaftCore to allow to reset replication for a specific node when log revert is detected. - `allow=true`: This method instructs the RaftCore to allow the target node's log to revert to a previous state for one time. - `allow=false`: This method instructs the RaftCore to panic if the target node's log revert ### Behavior - If this node is the Leader, it will attempt to replicate logs to the target node from the beginning. - If this node is not the Leader, the request is ignored. - If the target node is not found, the request is ignored. ### Automatic Replication Reset When the `loosen-follower-log-revert` feature flag is enabled, the Leader automatically reset replication if it detects that the target node's log has reverted. This feature is primarily useful in testing environments. ### Production Considerations In production environments, state reversion is a critical issue that should not be automatically handled. However, there may be scenarios where a Follower's data is intentionally removed and needs to rejoin the cluster(without membership changes). In such cases, the Leader should reinitialize replication for that node with the following steps: - Shut down the target node. - call [`Self::allow_next_revert`] on the Leader. - Clear the target node's data directory. - Restart the target node. - Fix: databendlabs#1251
drmingdrmer
added a commit
to drmingdrmer/openraft
that referenced
this issue
Oct 15, 2024
…ation for next detected follower log revert This method requests the RaftCore to allow to reset replication for a specific node when log revert is detected. - `allow=true`: This method instructs the RaftCore to allow the target node's log to revert to a previous state for one time. - `allow=false`: This method instructs the RaftCore to panic if the target node's log revert ### Behavior - If this node is the Leader, it will attempt to replicate logs to the target node from the beginning. - If this node is not the Leader, the request is ignored. - If the target node is not found, the request is ignored. ### Automatic Replication Reset When the `loosen-follower-log-revert` feature flag is enabled, the Leader automatically reset replication if it detects that the target node's log has reverted. This feature is primarily useful in testing environments. ### Production Considerations In production environments, state reversion is a critical issue that should not be automatically handled. However, there may be scenarios where a Follower's data is intentionally removed and needs to rejoin the cluster(without membership changes). In such cases, the Leader should reinitialize replication for that node with the following steps: - Shut down the target node. - call [`Self::allow_next_revert`] on the Leader. - Clear the target node's data directory. - Restart the target node. - Fix: databendlabs#1251
drmingdrmer
added a commit
to drmingdrmer/openraft
that referenced
this issue
Oct 24, 2024
…ation for next detected follower log revert This method requests the RaftCore to allow to reset replication for a specific node when log revert is detected. - `allow=true`: This method instructs the RaftCore to allow the target node's log to revert to a previous state for one time. - `allow=false`: This method instructs the RaftCore to panic if the target node's log revert This method returns `Fatal` error if failed to send the request to RaftCore, e.g. when RaftCore is shut down. Otherwise, it returns a `Ok(Result<_,_>)`, the inner result is: - `Ok(())` if the request is successfully processed, - or `Err(AllowNextRevertError)` explaining why the request is rejected. ### Behavior - If this node is the Leader, it will attempt to replicate logs to the target node from the beginning. - If this node is not the Leader, the request is ignored. - If the target node is not found, the request is ignored. ### Automatic Replication Reset When the `loosen-follower-log-revert` feature flag is enabled, the Leader automatically reset replication if it detects that the target node's log has reverted. This feature is primarily useful in testing environments. ### Production Considerations In production environments, state reversion is a critical issue that should not be automatically handled. However, there may be scenarios where a Follower's data is intentionally removed and needs to rejoin the cluster(without membership changes). In such cases, the Leader should reinitialize replication for that node with the following steps: - Shut down the target node. - call [`Self::allow_next_revert`] on the Leader. - Clear the target node's data directory. - Restart the target node. - Fix: databendlabs#1251
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Currently, we have a feature called
loosen-follower-log-revert
to allow follower's log to be cut. This is normally undesirable, since it shows a problem in the follower implementation and should not be used.However, there are use cases when a node is being recovered in failure case and the log is (partially) reinitialized. This is orchestrated from the outside. In this case, it might be desirable to temporarily allow follower's log to go back.
There are currently a few options how to do this that I can think of:
Opinions? Am I missing something?
The text was updated successfully, but these errors were encountered: