Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a caution note about making apply context bounded #18674

Closed
wants to merge 1 commit into from

Conversation

shyamjvs
Copy link
Contributor

@shyamjvs shyamjvs commented Oct 3, 2024

Following up from #18667 (comment).

/cc @ahrtr @serathius

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: shyamjvs
Once this PR has been reviewed and has the lgtm label, please assign serathius for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link

Hi @shyamjvs. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@codecov-commenter
Copy link

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 68.72%. Comparing base (c1976a6) to head (a83f274).

Current head a83f274 differs from pull request most recent head 38a42c6

Please upload reports for the commit 38a42c6 to get more accurate results.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
Files with missing lines Coverage Δ
server/etcdserver/apply/uber_applier.go 86.17% <ø> (ø)

... and 24 files with indirect coverage changes

@@            Coverage Diff             @@
##             main   #18674      +/-   ##
==========================================
- Coverage   68.78%   68.72%   -0.07%     
==========================================
  Files         420      420              
  Lines       35539    35539              
==========================================
- Hits        24446    24424      -22     
- Misses       9665     9680      +15     
- Partials     1428     1435       +7     

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c1976a6...38a42c6. Read the comment docs.

Comment on lines +118 to +122
//
// CAUTION: The context below should NOT be changed to a bounded value without
// first addressing the risk of a transaction's operations being only partially
// applied when some operations timeout. More details here:
// https://github.com/etcd-io/etcd/issues/18667#issuecomment-2392286839
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//
// CAUTION: The context below should NOT be changed to a bounded value without
// first addressing the risk of a transaction's operations being only partially
// applied when some operations timeout. More details here:
// https://github.com/etcd-io/etcd/issues/18667#issuecomment-2392286839
//
// CAUTION: Do NOT change the context below to have a timeout (i.e., bounded value).
// The apply workflow may be intentionally interrupted for expected reasons, but it
// should never fail due to non-deterministic factors such as a context timeout. If
// the workflow is interrupted by such factors, it can lead to a scenario where some
// members apply changes successfully while others fail, potentially causing data
// inconsistency issues like https://github.com/etcd-io/etcd/issues/18667.
// See also https://github.com/etcd-io/etcd/issues/18667#issuecomment-2392286839.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the idea that adding a comment is meant to protect us from disaster. Context timeout can be assigned at any lower level of the apply loop where is comment is not present, with same consequences.

Would prefer a slow methodical removal of context. First PR can just remove ctx from Apply method that we know uses context.TODO().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the idea that adding a comment is meant to protect us from disaster. Context timeout can be assigned at any lower level of the apply loop where is comment is not present, with same consequences.

It's just an easy & safe & temporary improvement for the existing situation. Overall not a big deal to me, so doesn't deserve too much discussing this. Either quickly approve & merge this PR or just reject it.

First PR can just remove ctx from Apply method that we know uses context.TODO().

I'd suggest to evaluate the effort & impact. Afterwards, we can breakdown it into PRs.

Copy link
Member

@serathius serathius Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to evaluate the effort & impact. Afterwards, we can breakdown it into PRs.

We already did in #18667 (comment), as I mentioned that context is just used in 2 ways, tracing and authorization metadata. Removing a argument from function is not that risky. If there is a risk we should mitigate it with testing, but should not just leave a comment and say we patched the problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already did in #18667 (comment), as I mentioned that context is just used in 2 ways, tracing and authorization metadata.

I don't think a simple comment is enough. I expect a doc or summary to clarify why (of course it's already clear) and "how" you will resolve it, and the "impact" on the etcdserver. It would be perfect if we can have PoC PRs.

@ahrtr
Copy link
Member

ahrtr commented Oct 4, 2024

Thanks @shyamjvs for the PR, which makes sense.

Just reworded the comment to make it clearer.

@ahrtr
Copy link
Member

ahrtr commented Oct 4, 2024

Please also signoff the commit, please read https://github.com/etcd-io/etcd/pull/18674/checks?check_run_id=31053782708

@ahrtr
Copy link
Member

ahrtr commented Oct 4, 2024

Closing this one since we go for another PR #18675

Thanks @shyamjvs anyway!

@ahrtr ahrtr closed this Oct 4, 2024
@shyamjvs shyamjvs deleted the add-caution branch October 30, 2024 22:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

5 participants