Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix SEGV and deadlock when stopping async reads #129

Merged
merged 1 commit into from
Feb 27, 2023

Conversation

charlescochran
Copy link

As of mercuryapi-1.31.4.35, when tm_reader_async.c's TMR_stopReading()
is called by Reader_stop_reading(), it ultimately ends up waiting for
reader->readState to become TMR_READ_STATE_DONE while holding the GIL.
This happens in tm_reader_async.c's do_background_reads(), but not until
the tag queue is emptied by read callbacks. These are triggered by
invoke_read_callback(), but this function blocks until it can acquire
the GIL. This frequently results in deadlock upon attempting to stop an
asynchronous read.

Previously, this was addressed by temporarily setting self->readCallback
(and self->statsCallback) to NULL in Reader_stop_reading(), but this
causes a SEGV in invoke_read_callback() if it occurs after
self->readCallback is verified to not be NULL but before it is called.
Instead of temporarily setting these function pointers to NULL, release
the GIL in Reader_stop_reading(), allowing invoke_read_callback() to
empty the tag queue and thereby avoiding deadlock. Then, use a mutex to
prevent concurrent calls to TMR_stopReading(), since this was previously
prevented by the GIL.

Temporarily setting self->statsCallback to NULL in Reader_stop_reading()
also prevented stats callbacks from occurring in subsequent asynchronous
reads. This functionality is now fixed.

As of mercuryapi-1.31.4.35, when tm_reader_async.c's TMR_stopReading()
is called by Reader_stop_reading(), it ultimately ends up waiting for
reader->readState to become TMR_READ_STATE_DONE while holding the GIL.
This happens in tm_reader_async.c's do_background_reads(), but not until
the tag queue is emptied by read callbacks. These are triggered by
invoke_read_callback(), but this function blocks until it can acquire
the GIL. This frequently results in deadlock upon attempting to stop an
asynchronous read.

Previously, this was partially addressed by temporarily setting
self->readCallback (and self->statsCallback) to NULL in
Reader_stop_reading(), but this causes a SEGV in invoke_read_callback()
if it occurs after self->readCallback is verified to not be NULL but
before it is called.  Instead of temporarily setting these function
pointers to NULL, release the GIL in Reader_stop_reading(), allowing
invoke_read_callback() to empty the tag queue and thereby avoiding
deadlock. Then, use a mutex to prevent concurrent calls to
TMR_stopReading(), since this was previously prevented by the GIL.

Temporarily setting self->statsCallback to NULL in Reader_stop_reading()
also prevented stats callbacks from occurring in subsequent asynchronous
reads. This functionality is now fixed.
@charlescochran
Copy link
Author

@gotthardp @natcl

Can we merge this? I think it is beneficial to the library, and it may fix the root cause of a few of the currently open issues. See this discussion for detailed explanations concerning a different but-related upstream bug which also exists.

@gotthardp gotthardp merged commit 325efef into gotthardp:master Feb 27, 2023
@gotthardp
Copy link
Owner

@charlescochran Thank you! Looks reasonable, but I was not able to test it though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants