-
Notifications
You must be signed in to change notification settings - Fork 591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent deadlock when closing a channel using CloseAsync in 7.x #1751
Comments
Hi, thanks for the report. As I'm sure you're aware of, there's not much to work with here 😸 Obviously, the gold standard is to provide code that reproduces this issue, or at least some idea of steps to do so.
What does this mean? Do you have some way in your application to increase the frequency of channel closure? |
We're running tests that create and close channels very frequently, and it appears that the test suite that do this the most; is the the one that is usually getting stuck. Anyhow, I can try to look dig into this further and see if I can provide something that will help you reproduce it. Thanks |
@Andersso channel and connection churn are workloads explicitly recommended against. |
It would be extremely helpful for you to share your test code. If you can't do that, describe the test as best you can:
My guess is that you could be hitting a
This is a related issue: |
Also note that management UI has connection and channel churn metrics, on the Overview page but also on the node page IIRC. So at the very least it should be easy to see the churn rate: is it 50 channels opened per second? Is it 200? |
Describe the bug
Hi there,
Ever since upgrading from 6.x to 7.x, I've been running into intermittent deadlocks whenever I try to close a channel via
CloseAsync
.I haven't been able to reproduce it locally, but I've been able to do some remote debugging, but I could not get any insight. (all TP threads are waiting for work)
I did however manage to run a
dotnet-dump dumpasync
during one of these deadlocks and got the following info:First dump
Second dump (another instance)
I noticed that in both dump instances, the stacks aren’t displayed with the usual
Awaiting:
notation you often see in async stack traces, but it might be normal.Reproduction steps
I haven’t pinned down a reliable way to reproduce this, but calling
CloseAsync
more frequently seems to increase the chances of hitting the deadlock. It also appears more common on Linux than Windows, though that might just be due to hardware differences rather than OS behavior.Expected behavior
When calling
CloseAsync
, I’d expect the channel to close normally without causing a deadlock.Additional context
No response
The text was updated successfully, but these errors were encountered: