You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is a design issue in Cadence that potentially cause queryWorkflow high latency. If query workflow is the first action after a long time period of inactivity, the query request could take more than 5 seconds.
When worker hosts restarted, the sticky tasklist may not be able to reset, and there is no mechanism to tell Cadence server to ensure resetting them today.Then later on when dispatching a query task, it still prioritized to send to the sticky tasklist, which will eventually timeout and then reset tasklist and then resend to normal taklist. As a result, the latency becomes much higher than usual.
3+ years ago, as a solution, we introduced stickyTTL in #2261 is to invalidate the sticky tasklist when it expires the stickyTTL. This has proved to mitigate the prod issues in Uber. However, due to the potential perf penalty, we didn't change the default value.
Another idea is to implement #2369 but this requires lots of work, and we never prioritize it.
Another approach is to automatically invalidate sticky tasklist when processing query task and there is no active poller for some time like 1 minutes. This is much safer than stickyTTL approach for perf penalty.
The text was updated successfully, but these errors were encountered:
longquanzheng
changed the title
Automatically invalidate sticky tasklist when processing query task and there is no active poller for some time
Query workflow high latency after a long inactive time
Dec 12, 2021
There is a design issue in Cadence that potentially cause queryWorkflow high latency. If query workflow is the first action after a long time period of inactivity, the query request could take more than 5 seconds.
When worker hosts restarted, the sticky tasklist may not be able to reset, and there is no mechanism to tell Cadence server to ensure resetting them today.Then later on when dispatching a query task, it still prioritized to send to the sticky tasklist, which will eventually timeout and then reset tasklist and then resend to normal taklist. As a result, the latency becomes much higher than usual.
3+ years ago, as a solution, we introduced
stickyTTL
in #2261 is to invalidate the sticky tasklist when it expires the stickyTTL. This has proved to mitigate the prod issues in Uber. However, due to the potential perf penalty, we didn't change the default value.Another idea is to implement #2369 but this requires lots of work, and we never prioritize it.
Another approach is to automatically invalidate sticky tasklist when processing query task and there is no active poller for some time like 1 minutes. This is much safer than
stickyTTL
approach for perf penalty.The text was updated successfully, but these errors were encountered: