Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query workflow high latency after a long inactive time #4677

Open
longquanzheng opened this issue Dec 10, 2021 · 1 comment
Open

Query workflow high latency after a long inactive time #4677

longquanzheng opened this issue Dec 10, 2021 · 1 comment
Labels
customer Feature asks from customer improvement Incremental improvement for existing features

Comments

@longquanzheng
Copy link
Contributor

longquanzheng commented Dec 10, 2021

There is a design issue in Cadence that potentially cause queryWorkflow high latency. If query workflow is the first action after a long time period of inactivity, the query request could take more than 5 seconds.

When worker hosts restarted, the sticky tasklist may not be able to reset, and there is no mechanism to tell Cadence server to ensure resetting them today.Then later on when dispatching a query task, it still prioritized to send to the sticky tasklist, which will eventually timeout and then reset tasklist and then resend to normal taklist. As a result, the latency becomes much higher than usual.

3+ years ago, as a solution, we introduced stickyTTL in #2261 is to invalidate the sticky tasklist when it expires the stickyTTL. This has proved to mitigate the prod issues in Uber. However, due to the potential perf penalty, we didn't change the default value.

Another idea is to implement #2369 but this requires lots of work, and we never prioritize it.

Another approach is to automatically invalidate sticky tasklist when processing query task and there is no active poller for some time like 1 minutes. This is much safer than stickyTTL approach for perf penalty.

@longquanzheng longquanzheng changed the title Automatically invalidate sticky tasklist when processing query task and there is no active poller for some time Query workflow high latency after a long inactive time Dec 12, 2021
@longquanzheng
Copy link
Contributor Author

This is fixed in Temporal: temporalio/temporal#2363

@ibarrajo ibarrajo added improvement Incremental improvement for existing features customer Feature asks from customer labels Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
customer Feature asks from customer improvement Incremental improvement for existing features
Projects
None yet
Development

No branches or pull requests

2 participants