-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a caution note about making apply context bounded #18674
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: shyamjvs The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @shyamjvs. Thanks for your PR. I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files
... and 24 files with indirect coverage changes @@ Coverage Diff @@
## main #18674 +/- ##
==========================================
- Coverage 68.78% 68.72% -0.07%
==========================================
Files 420 420
Lines 35539 35539
==========================================
- Hits 24446 24424 -22
- Misses 9665 9680 +15
- Partials 1428 1435 +7 Continue to review full report in Codecov by Sentry.
|
// | ||
// CAUTION: The context below should NOT be changed to a bounded value without | ||
// first addressing the risk of a transaction's operations being only partially | ||
// applied when some operations timeout. More details here: | ||
// https://github.com/etcd-io/etcd/issues/18667#issuecomment-2392286839 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// | |
// CAUTION: The context below should NOT be changed to a bounded value without | |
// first addressing the risk of a transaction's operations being only partially | |
// applied when some operations timeout. More details here: | |
// https://github.com/etcd-io/etcd/issues/18667#issuecomment-2392286839 | |
// | |
// CAUTION: Do NOT change the context below to have a timeout (i.e., bounded value). | |
// The apply workflow may be intentionally interrupted for expected reasons, but it | |
// should never fail due to non-deterministic factors such as a context timeout. If | |
// the workflow is interrupted by such factors, it can lead to a scenario where some | |
// members apply changes successfully while others fail, potentially causing data | |
// inconsistency issues like https://github.com/etcd-io/etcd/issues/18667. | |
// See also https://github.com/etcd-io/etcd/issues/18667#issuecomment-2392286839. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the idea that adding a comment is meant to protect us from disaster. Context timeout can be assigned at any lower level of the apply loop where is comment is not present, with same consequences.
Would prefer a slow methodical removal of context. First PR can just remove ctx from Apply
method that we know uses context.TODO().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the idea that adding a comment is meant to protect us from disaster. Context timeout can be assigned at any lower level of the apply loop where is comment is not present, with same consequences.
It's just an easy & safe & temporary improvement for the existing situation. Overall not a big deal to me, so doesn't deserve too much discussing this. Either quickly approve & merge this PR or just reject it.
First PR can just remove ctx from
Apply
method that we know uses context.TODO().
I'd suggest to evaluate the effort & impact. Afterwards, we can breakdown it into PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest to evaluate the effort & impact. Afterwards, we can breakdown it into PRs.
We already did in #18667 (comment), as I mentioned that context is just used in 2 ways, tracing and authorization metadata. Removing a argument from function is not that risky. If there is a risk we should mitigate it with testing, but should not just leave a comment and say we patched the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already did in #18667 (comment), as I mentioned that context is just used in 2 ways, tracing and authorization metadata.
I don't think a simple comment is enough. I expect a doc or summary to clarify why (of course it's already clear) and "how" you will resolve it, and the "impact" on the etcdserver. It would be perfect if we can have PoC PRs.
Thanks @shyamjvs for the PR, which makes sense. Just reworded the comment to make it clearer. |
Please also signoff the commit, please read https://github.com/etcd-io/etcd/pull/18674/checks?check_run_id=31053782708 |
Following up from #18667 (comment).
/cc @ahrtr @serathius