Query workflow high latency after a long inactive time #4677

longquanzheng · 2021-12-10T06:03:23Z

There is a design issue in Cadence that potentially cause queryWorkflow high latency. If query workflow is the first action after a long time period of inactivity, the query request could take more than 5 seconds.

When worker hosts restarted, the sticky tasklist may not be able to reset, and there is no mechanism to tell Cadence server to ensure resetting them today.Then later on when dispatching a query task, it still prioritized to send to the sticky tasklist, which will eventually timeout and then reset tasklist and then resend to normal taklist. As a result, the latency becomes much higher than usual.

3+ years ago, as a solution, we introduced stickyTTL in #2261 is to invalidate the sticky tasklist when it expires the stickyTTL. This has proved to mitigate the prod issues in Uber. However, due to the potential perf penalty, we didn't change the default value.

Another idea is to implement #2369 but this requires lots of work, and we never prioritize it.

Another approach is to automatically invalidate sticky tasklist when processing query task and there is no active poller for some time like 1 minutes. This is much safer than stickyTTL approach for perf penalty.

The text was updated successfully, but these errors were encountered:

longquanzheng · 2022-05-27T23:01:28Z

This is fixed in Temporal: temporalio/temporal#2363

longquanzheng changed the title ~~Automatically invalidate sticky tasklist when processing query task and there is no active poller for some time~~ Query workflow high latency after a long inactive time Dec 12, 2021

ibarrajo added improvement Incremental improvement for existing features customer Feature asks from customer labels Nov 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query workflow high latency after a long inactive time #4677

Query workflow high latency after a long inactive time #4677

longquanzheng commented Dec 10, 2021 •

edited

Loading

longquanzheng commented May 27, 2022

Query workflow high latency after a long inactive time #4677

Query workflow high latency after a long inactive time #4677

Comments

longquanzheng commented Dec 10, 2021 • edited Loading

longquanzheng commented May 27, 2022

longquanzheng commented Dec 10, 2021 •

edited

Loading