Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ObjectDisposedException when connection is closed from the server #1749

Open
ramonsmits opened this issue Dec 16, 2024 · 13 comments · May be fixed by #1752
Open

ObjectDisposedException when connection is closed from the server #1749

ramonsmits opened this issue Dec 16, 2024 · 13 comments · May be fixed by #1752
Assignees
Labels
Milestone

Comments

@ramonsmits
Copy link
Contributor

Describe the bug

Logging FirstChanceExceptions shows some strange ObjectDisposedException:

https://github.com/rabbitmq/rabbitmq-dotnet-client/blob/v7.0.0/projects/RabbitMQ.Client/Impl/Connection.Heartbeat.cs#L164 accessing .Token here can result in an ObjectDisposedException. That is catched, but the comment states that its for the timer

The SetSessionClosingAsync uses a _closingSemaphore that can already be disposed. It seems this isn't gracefully handled in the stacktrace.

https://github.com/rabbitmq/rabbitmq-dotnet-client/blob/v7.0.0/projects/RabbitMQ.Client/Impl/MainSession.cs

I'm not sure if this is causing issues but looking at the code it might be needed that these code paths all need to have exception handling to prevent code not to terminate if these ObjectDisposedExceptions are to be expected.

Reproduction steps

  1. Use MassTransit.RabbitMQ 8.3.4
  2. Log FirstChanceException
  3. Create a consumer and connect to the broker
  4. Close the connections on the broker
  5. Observe the consumer trying to reconnect

Do this a couple of times and you'll see the related exception

Expected behavior

gracefully handle ObjectDisposedException on all code paths for closed connections

Additional context

No response

@ramonsmits ramonsmits added the bug label Dec 16, 2024
@lukebakken lukebakken self-assigned this Dec 16, 2024
@lukebakken lukebakken added this to the 7.1.0 milestone Dec 16, 2024
@ZajacPiotr98
Copy link

ZajacPiotr98 commented Dec 19, 2024

I encountered similar problem, but kind of backwards, because this ObjectDisposedException caused closing connection with such close reason.

CloseReason: AMQP close-reason, initiated by Library, code=541, text='Cannot access a disposed object. Object name: 'System.Threading.SemaphoreSlim'.', classId=0, methodId=0, exception=System.ObjectDisposedException: Cannot access a disposed object. Object name: 'System.Threading.SemaphoreSlim'. at System.Threading.SemaphoreSlim.WaitAsync(Int32 millisecondsTimeout, CancellationToken cancellationToken) at RabbitMQ.Client.Impl.Channel.MaybeHandlePublisherConfirmationTcsOnChannelShutdownAsync(ShutdownEventArgs reason) at RabbitMQ.Client.Impl.Channel.OnChannelShutdownAsync(ShutdownEventArgs reason) at RabbitMQ.Client.Impl.Channel.OnSessionShutdownAsync(Object sender, ShutdownEventArgs reason) at RabbitMQ.Client.Impl.AsyncEventingWrapper1.InternalInvoke(Delegate[] handlers, Object sender, TEvent event)    at RabbitMQ.Client.Impl.AsyncEventingWrapper1.InternalInvoke(Delegate[] handlers, Object sender, TEvent event) at RabbitMQ.Client.Impl.Channel.HandleChannelCloseAsync(IncomingCommand cmd, CancellationToken cancellationToken) at RabbitMQ.Client.Impl.Channel.HandleCommandAsync(IncomingCommand cmd, CancellationToken cancellationToken) at RabbitMQ.Client.Framing.Connection.ProcessFrameAsync(InboundFrame frame, CancellationToken cancellationToken) at RabbitMQ.Client.Framing.Connection.ReceiveLoopAsync(CancellationToken mainLoopCancellationToken) at RabbitMQ.Client.Framing.Connection.MainLoop()

The error happened when there was significant load and RabbitMQ responses were several times slower than usual. But I haven't got to the bottom of issue and I don't know the root cause. I'm not sure if it's library itself or usage of the library causing the error, but I can't find any documentation about handling this.
The code that I use tries to close a channel before disposal, but only when channel IsOpen and the task is awaited until timeout is reached:
Plain Text
IChannel Instance
...
var timeoutCts = new CancellationTokenSource(timeout);
var closeChannelTask = this.Instance.CloseAsync();
await closeChannelTask.WaitAsync(timeoutCts.Token);

Without this timeout some of close requests were running for really long (maybe even infinitely? Some deadlock? There was no channel on server side), so the timeout was added to unblock the thread.

To be sure I also checked without running CloseAsync on IChannel at all, but it haven't helped.

lukebakken added a commit that referenced this issue Dec 19, 2024
@lukebakken
Copy link
Contributor

lukebakken commented Dec 19, 2024

@ramonsmits I still have yet to hit an ObjectDisposedException using my test project. If you could provide the entire stack trace, it would help me out.

Here is my test application:
https://github.com/rabbitmq/rabbitmq-dotnet-client/tree/rabbitmq-dotnet-client-1749/projects/Applications/GH-1749

It starts a consumer, then repeatedly uses the HTTP API to close the connection. I have left it running a long time in my test environment, and have yet to see ObjectDisposedException.

Maybe there's something up with my code? If you can take a look, I'd appreciate it.

@ZajacPiotr98
Copy link

I managed to reproduce the problem I described. I called DisposeAsync method twice simultaneously and extended Close response time using clumsy. First Dispose invocation called CloseAsync and it set CloseReason, so IsClosed was true. The second invocation went straight to the semaphores disposing. When Close response was returned, the semaphore was already disposed. It thrown ObjectDisposedException and closed entire Connection.

I think the ObjectDisposedException is the reason of that and a bug. However I'm not sure if closing connection on any exception is good approach. Probably there should be distinction between some server exceptions or any other that should cause it, and the exceptions thrown by bugs in the implementation.

@michaelklishin
Copy link
Member

For I/O exceptions, it is. And it can and does trigger a connection recovery.

For when a connection is closed, the methods should simply be made idempotent. And Dispose should likely be an equivalent of calling CloseAsync the first time, and equally idempotent the 2nd, 3rd time, and so on.

@lukebakken
Copy link
Contributor

For when a connection is closed, the methods should simply be made idempotent. And Dispose should likely be an equivalent of calling CloseAsync the first time, and equally idempotent the 2nd, 3rd time, and so on.

Yep, that's how it's supposed to work 😉

Thank you for the details, @ZajacPiotr98. Since you can reproduce this issue, would you be able to test a pull request once I open it?

lukebakken added a commit that referenced this issue Dec 20, 2024
Fixes #1749

* Add test project
* Add `FirstChanceException` logging.
* Add code to repeatedly close connection.
* Allow passing hostname to GH-1749 program as the first arg.
@lukebakken lukebakken linked a pull request Dec 20, 2024 that will close this issue
@ZajacPiotr98
Copy link

ZajacPiotr98 commented Dec 23, 2024

I tested these changes and it's not resolving problem for the case I run. It was simple test:

var task1 = channel.DisposeAsync();
var task2 = channel.DisposeAsync();
await task1;
await task2;

Also, for this test the CloseReason of Connection has HandleChannelCloseOkAsync is the stack trace as close was invoked from the app. But for running load tests on the app using this library, I see more frequently HandleChannelCloseAsync in the CloseReason stack trace. I tried to debug it locally when exception is thrown from invoking a Command (e.g. QueueDeclarePassiveAsync with not existing queue name). When Dispose finishes execution before HandleChannelCloseAsync finished, then error occurs. In the app, the channel is disposed on every exception as it shouldn't be reused for subsequent requests.

        try
        {
            return await channel.QueueDeclarePassiveAsync(queueName);
        }
        catch (Exception)
        {
            await channel.DisposeAsync();
            throw;
        }

I'm not completely sure what is handled in the HandleChannelCloseOkAsync and HandleChannelCloseAsync. Is it necessary to always invoke these methods? If so, maybe Dispose should wait with semaphore disposal until these handles finish (with some timeout probably). Or these methods should ignore waiting for semaphore if it's already disposed.

@michaelklishin
Copy link
Member

@ZajacPiotr98 as far as I understand, #1752 focuses on connection objects and you are trying with channels.

Their (channels') behavior should generally be identical to the idea above but you and @lukebakken may be focussing on the same problem in two different places.

@lukebakken
Copy link
Contributor

@ZajacPiotr98 did you follow the same test procedure you detailed in this comment?

#1752 adds a _disposedValue check to the Channel class, so your second DisposeAsync() should do nothing. I'm going to add a basic test to the PR, but it won't use Clumsy, of course.

@ZajacPiotr98
Copy link

ZajacPiotr98 commented Dec 23, 2024

Yes, maybe I was not precise if it was Channel or Connection. Now I've done quick test for Connection. Same thing with invoking DisposeAsync, and awaiting both tasks later. Also got an ObjectDisposedException. In this case From the invocation of Connection disposal await _channel0.DisposeAsync().

@lukebakken I've even tried without Clumsy and had same results. _disposedValue is set at the end of Dispose method invocation. If the second invocation starts before the first one is finished, then the problem occurs. The check for the _disposedValue in the second invocation finishes before setting this to true in the first one.

Additionally, to mitigate this behavior, as workaround, I added in the app the isDisposing field, that is setting to true at the start of Disposing channel and never lets run another DisposeAsync on that channel or only when first one is finished. But it still doesn't work for the HandleChannelCloseAsync case I described and when disposal is reaching the timeout on the app side (underneath commands are still received, only awaiting for disposal is cancelled in the app). I added that because of #1751

@lukebakken
Copy link
Contributor

Ah, you're not awaiting the DisposeAsync(). Yes, I see the problem. Hopefully nobody does that in practice. I'll come up with a fix.

lukebakken added a commit that referenced this issue Dec 23, 2024
Fixes #1749

* Add test project
* Add `FirstChanceException` logging.
* Add code to repeatedly close connection.
* Allow passing hostname to GH-1749 program as the first arg.
* Toxiproxy tests fixup.
* Standardize on `_disposedValue` name.
* Ensure `_disposedValue` is set in `finally` block.
* Add lock object for disposal of Channels and Connections. Note: a `SemaphoreSlim` can't be used because it must be disposed as well, and that can't happen cleanly in a `Dispose` method.
@lukebakken
Copy link
Contributor

@ZajacPiotr98 - please give my PR a test. I've added what amounts to a "Disposing" state to channels and connections, that only allows one thread to do the disposal.

@ZajacPiotr98
Copy link

I will be able to test it next week.

But the problem is not with the multiple dispose. I'm awaiting the dispose invocation and it only happens when multiple threads do dispose in the same time. But, like I mentioned I corrected this behavior with workaround on my side.

The problem, that is really impacting application is:
HandleChannelCloseAsync and HandleChannelCloseOkAsync are invoked or at least finish when the semaphore are already disposed and it throws the exception, that is not handles, thus closing entire connection with all channels opened on that connection (so all operations in progress are canceled). This is impacting performance of the application and I don't see any option to suppress invocation of these HandleChannelClose methods.

I think invoking dispose on the exception or when channel is no longer needed should not close entire connection. But it does sometimes, when disposal of semaphores finishes before HandleChannelClose.

I only used double dispose without awaiting as an example to reproduce that issue. I couldn't find a way to reproduce it same as on the environment under load tests as there is significant load and thus closing channel is taking more time.

I asked about it here:

I'm not completely sure what is handled in the HandleChannelCloseOkAsync and HandleChannelCloseAsync. Is it necessary to always invoke these methods? If so, maybe Dispose should wait with semaphore disposal until these handles finish (with some timeout probably). Or these methods should ignore waiting for semaphore if it's already disposed.

@michaelklishin
Copy link
Member

Well, concurrent disposal is a scenario I haven't seen mentioned here, I only thought of the case where a connectio nor channel is disposed N times sequentially.

Given that shared use of channels was not supported by this library till 7.0, and that concurrent disposal is not necessarily supported by a lot of libraries for other data services, this may be out of scope for this client.

Sharing one-off channels for passive declares between threads seems unnecessary to me but maybe that's an inevitable effect of adopting tasks for the parts that interact with RabbitMQ :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants