ObjectDisposedException when connection is closed from the server #1749

ramonsmits · 2024-12-16T16:41:39Z

Describe the bug

Logging FirstChanceExceptions shows some strange ObjectDisposedException:

https://github.com/rabbitmq/rabbitmq-dotnet-client/blob/v7.0.0/projects/RabbitMQ.Client/Impl/Connection.Heartbeat.cs#L164 accessing .Token here can result in an ObjectDisposedException. That is catched, but the comment states that its for the timer

The SetSessionClosingAsync uses a _closingSemaphore that can already be disposed. It seems this isn't gracefully handled in the stacktrace.

https://github.com/rabbitmq/rabbitmq-dotnet-client/blob/v7.0.0/projects/RabbitMQ.Client/Impl/MainSession.cs

I'm not sure if this is causing issues but looking at the code it might be needed that these code paths all need to have exception handling to prevent code not to terminate if these ObjectDisposedExceptions are to be expected.

Reproduction steps

Use MassTransit.RabbitMQ 8.3.4
Log FirstChanceException
Create a consumer and connect to the broker
Close the connections on the broker
Observe the consumer trying to reconnect

Do this a couple of times and you'll see the related exception

Expected behavior

gracefully handle ObjectDisposedException on all code paths for closed connections

Additional context

No response

The text was updated successfully, but these errors were encountered:

ZajacPiotr98 · 2024-12-19T16:37:48Z

I encountered similar problem, but kind of backwards, because this ObjectDisposedException caused closing connection with such close reason.

CloseReason: AMQP close-reason, initiated by Library, code=541, text='Cannot access a disposed object. Object name: 'System.Threading.SemaphoreSlim'.', classId=0, methodId=0, exception=System.ObjectDisposedException: Cannot access a disposed object. Object name: 'System.Threading.SemaphoreSlim'. at System.Threading.SemaphoreSlim.WaitAsync(Int32 millisecondsTimeout, CancellationToken cancellationToken) at RabbitMQ.Client.Impl.Channel.MaybeHandlePublisherConfirmationTcsOnChannelShutdownAsync(ShutdownEventArgs reason) at RabbitMQ.Client.Impl.Channel.OnChannelShutdownAsync(ShutdownEventArgs reason) at RabbitMQ.Client.Impl.Channel.OnSessionShutdownAsync(Object sender, ShutdownEventArgs reason) at RabbitMQ.Client.Impl.AsyncEventingWrapper1.InternalInvoke(Delegate[] handlers, Object sender, TEvent event)    at RabbitMQ.Client.Impl.AsyncEventingWrapper1.InternalInvoke(Delegate[] handlers, Object sender, TEvent event) at RabbitMQ.Client.Impl.Channel.HandleChannelCloseAsync(IncomingCommand cmd, CancellationToken cancellationToken) at RabbitMQ.Client.Impl.Channel.HandleCommandAsync(IncomingCommand cmd, CancellationToken cancellationToken) at RabbitMQ.Client.Framing.Connection.ProcessFrameAsync(InboundFrame frame, CancellationToken cancellationToken) at RabbitMQ.Client.Framing.Connection.ReceiveLoopAsync(CancellationToken mainLoopCancellationToken) at RabbitMQ.Client.Framing.Connection.MainLoop()

The error happened when there was significant load and RabbitMQ responses were several times slower than usual. But I haven't got to the bottom of issue and I don't know the root cause. I'm not sure if it's library itself or usage of the library causing the error, but I can't find any documentation about handling this.
The code that I use tries to close a channel before disposal, but only when channel IsOpen and the task is awaited until timeout is reached:
Plain Text
IChannel Instance
...
var timeoutCts = new CancellationTokenSource(timeout);
var closeChannelTask = this.Instance.CloseAsync();
await closeChannelTask.WaitAsync(timeoutCts.Token);

Without this timeout some of close requests were running for really long (maybe even infinitely? Some deadlock? There was no channel on server side), so the timeout was added to unblock the thread.

To be sure I also checked without running CloseAsync on IChannel at all, but it haven't helped.

Fixes #1749 * Add test project

lukebakken · 2024-12-19T19:37:40Z

@ramonsmits I still have yet to hit an ObjectDisposedException using my test project. If you could provide the entire stack trace, it would help me out.

Here is my test application:
https://github.com/rabbitmq/rabbitmq-dotnet-client/tree/rabbitmq-dotnet-client-1749/projects/Applications/GH-1749

It starts a consumer, then repeatedly uses the HTTP API to close the connection. I have left it running a long time in my test environment, and have yet to see ObjectDisposedException.

Maybe there's something up with my code? If you can take a look, I'd appreciate it.

ZajacPiotr98 · 2024-12-20T10:34:22Z

I managed to reproduce the problem I described. I called DisposeAsync method twice simultaneously and extended Close response time using clumsy. First Dispose invocation called CloseAsync and it set CloseReason, so IsClosed was true. The second invocation went straight to the semaphores disposing. When Close response was returned, the semaphore was already disposed. It thrown ObjectDisposedException and closed entire Connection.

I think the ObjectDisposedException is the reason of that and a bug. However I'm not sure if closing connection on any exception is good approach. Probably there should be distinction between some server exceptions or any other that should cause it, and the exceptions thrown by bugs in the implementation.

michaelklishin · 2024-12-20T15:34:21Z

For I/O exceptions, it is. And it can and does trigger a connection recovery.

For when a connection is closed, the methods should simply be made idempotent. And Dispose should likely be an equivalent of calling CloseAsync the first time, and equally idempotent the 2nd, 3rd time, and so on.

lukebakken · 2024-12-20T17:22:06Z

For when a connection is closed, the methods should simply be made idempotent. And Dispose should likely be an equivalent of calling CloseAsync the first time, and equally idempotent the 2nd, 3rd time, and so on.

Yep, that's how it's supposed to work 😉

Thank you for the details, @ZajacPiotr98. Since you can reproduce this issue, would you be able to test a pull request once I open it?

Fixes #1749

Fixes #1749 * Add test project * Add `FirstChanceException` logging. * Add code to repeatedly close connection. * Allow passing hostname to GH-1749 program as the first arg.

ZajacPiotr98 · 2024-12-23T12:49:29Z

I tested these changes and it's not resolving problem for the case I run. It was simple test:

var task1 = channel.DisposeAsync();
var task2 = channel.DisposeAsync();
await task1;
await task2;

Also, for this test the CloseReason of Connection has HandleChannelCloseOkAsync is the stack trace as close was invoked from the app. But for running load tests on the app using this library, I see more frequently HandleChannelCloseAsync in the CloseReason stack trace. I tried to debug it locally when exception is thrown from invoking a Command (e.g. QueueDeclarePassiveAsync with not existing queue name). When Dispose finishes execution before HandleChannelCloseAsync finished, then error occurs. In the app, the channel is disposed on every exception as it shouldn't be reused for subsequent requests.

        try
        {
            return await channel.QueueDeclarePassiveAsync(queueName);
        }
        catch (Exception)
        {
            await channel.DisposeAsync();
            throw;
        }

I'm not completely sure what is handled in the HandleChannelCloseOkAsync and HandleChannelCloseAsync. Is it necessary to always invoke these methods? If so, maybe Dispose should wait with semaphore disposal until these handles finish (with some timeout probably). Or these methods should ignore waiting for semaphore if it's already disposed.

michaelklishin · 2024-12-23T15:46:04Z

@ZajacPiotr98 as far as I understand, #1752 focuses on connection objects and you are trying with channels.

Their (channels') behavior should generally be identical to the idea above but you and @lukebakken may be focussing on the same problem in two different places.

lukebakken · 2024-12-23T16:04:53Z

@ZajacPiotr98 did you follow the same test procedure you detailed in this comment?

#1752 adds a _disposedValue check to the Channel class, so your second DisposeAsync() should do nothing. I'm going to add a basic test to the PR, but it won't use Clumsy, of course.

ZajacPiotr98 · 2024-12-23T16:17:57Z

Yes, maybe I was not precise if it was Channel or Connection. Now I've done quick test for Connection. Same thing with invoking DisposeAsync, and awaiting both tasks later. Also got an ObjectDisposedException. In this case From the invocation of Connection disposal await _channel0.DisposeAsync().

@lukebakken I've even tried without Clumsy and had same results. _disposedValue is set at the end of Dispose method invocation. If the second invocation starts before the first one is finished, then the problem occurs. The check for the _disposedValue in the second invocation finishes before setting this to true in the first one.

Additionally, to mitigate this behavior, as workaround, I added in the app the isDisposing field, that is setting to true at the start of Disposing channel and never lets run another DisposeAsync on that channel or only when first one is finished. But it still doesn't work for the HandleChannelCloseAsync case I described and when disposal is reaching the timeout on the app side (underneath commands are still received, only awaiting for disposal is cancelled in the app). I added that because of #1751

lukebakken · 2024-12-23T17:35:07Z

Ah, you're not awaiting the DisposeAsync(). Yes, I see the problem. Hopefully nobody does that in practice. I'll come up with a fix.

Fixes #1749 * Add test project * Add `FirstChanceException` logging. * Add code to repeatedly close connection. * Allow passing hostname to GH-1749 program as the first arg. * Toxiproxy tests fixup. * Standardize on `_disposedValue` name. * Ensure `_disposedValue` is set in `finally` block. * Add lock object for disposal of Channels and Connections. Note: a `SemaphoreSlim` can't be used because it must be disposed as well, and that can't happen cleanly in a `Dispose` method.

lukebakken · 2024-12-23T19:53:08Z

@ZajacPiotr98 - please give my PR a test. I've added what amounts to a "Disposing" state to channels and connections, that only allows one thread to do the disposal.

ZajacPiotr98 · 2024-12-23T22:41:25Z

I will be able to test it next week.

But the problem is not with the multiple dispose. I'm awaiting the dispose invocation and it only happens when multiple threads do dispose in the same time. But, like I mentioned I corrected this behavior with workaround on my side.

The problem, that is really impacting application is:
HandleChannelCloseAsync and HandleChannelCloseOkAsync are invoked or at least finish when the semaphore are already disposed and it throws the exception, that is not handles, thus closing entire connection with all channels opened on that connection (so all operations in progress are canceled). This is impacting performance of the application and I don't see any option to suppress invocation of these HandleChannelClose methods.

I think invoking dispose on the exception or when channel is no longer needed should not close entire connection. But it does sometimes, when disposal of semaphores finishes before HandleChannelClose.

I only used double dispose without awaiting as an example to reproduce that issue. I couldn't find a way to reproduce it same as on the environment under load tests as there is significant load and thus closing channel is taking more time.

I asked about it here:

I'm not completely sure what is handled in the HandleChannelCloseOkAsync and HandleChannelCloseAsync. Is it necessary to always invoke these methods? If so, maybe Dispose should wait with semaphore disposal until these handles finish (with some timeout probably). Or these methods should ignore waiting for semaphore if it's already disposed.

michaelklishin · 2024-12-23T23:23:04Z

Well, concurrent disposal is a scenario I haven't seen mentioned here, I only thought of the case where a connectio nor channel is disposed N times sequentially.

Given that shared use of channels was not supported by this library till 7.0, and that concurrent disposal is not necessarily supported by a lot of libraries for other data services, this may be out of scope for this client.

Sharing one-off channels for passive declares between threads seems unnecessary to me but maybe that's an inevitable effect of adopting tasks for the parts that interact with RabbitMQ :(

ramonsmits added the bug label Dec 16, 2024

lukebakken self-assigned this Dec 16, 2024

lukebakken added this to the 7.1.0 milestone Dec 16, 2024

lukebakken added a commit that referenced this issue Dec 19, 2024

Fix exceptions when connection is closed

b91b5d9

Fixes #1749 * Add test project

lukebakken added a commit that referenced this issue Dec 19, 2024

* Allow passing hostname to GH-1749 program as the first arg.

377dbe0

lukebakken added a commit that referenced this issue Dec 20, 2024

Ensure DisposeAsync and Dispose are idempotent

050173d

Fixes #1749

lukebakken linked a pull request Dec 20, 2024 that will close this issue

Ensure DisposeAsync and Dispose are idempotent #1752

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ObjectDisposedException when connection is closed from the server #1749

ObjectDisposedException when connection is closed from the server #1749

ramonsmits commented Dec 16, 2024

ZajacPiotr98 commented Dec 19, 2024 •

edited by lukebakken

Loading

lukebakken commented Dec 19, 2024 •

edited

Loading

ZajacPiotr98 commented Dec 20, 2024

michaelklishin commented Dec 20, 2024

lukebakken commented Dec 20, 2024

ZajacPiotr98 commented Dec 23, 2024 •

edited

Loading

michaelklishin commented Dec 23, 2024

lukebakken commented Dec 23, 2024

ZajacPiotr98 commented Dec 23, 2024 •

edited

Loading

lukebakken commented Dec 23, 2024

lukebakken commented Dec 23, 2024

ZajacPiotr98 commented Dec 23, 2024

michaelklishin commented Dec 23, 2024

ObjectDisposedException when connection is closed from the server #1749

ObjectDisposedException when connection is closed from the server #1749

Comments

ramonsmits commented Dec 16, 2024

Describe the bug

Reproduction steps

Expected behavior

Additional context

ZajacPiotr98 commented Dec 19, 2024 • edited by lukebakken Loading

lukebakken commented Dec 19, 2024 • edited Loading

ZajacPiotr98 commented Dec 20, 2024

michaelklishin commented Dec 20, 2024

lukebakken commented Dec 20, 2024

ZajacPiotr98 commented Dec 23, 2024 • edited Loading

michaelklishin commented Dec 23, 2024

lukebakken commented Dec 23, 2024

ZajacPiotr98 commented Dec 23, 2024 • edited Loading

lukebakken commented Dec 23, 2024

lukebakken commented Dec 23, 2024

ZajacPiotr98 commented Dec 23, 2024

michaelklishin commented Dec 23, 2024

ZajacPiotr98 commented Dec 19, 2024 •

edited by lukebakken

Loading

lukebakken commented Dec 19, 2024 •

edited

Loading

ZajacPiotr98 commented Dec 23, 2024 •

edited

Loading

ZajacPiotr98 commented Dec 23, 2024 •

edited

Loading