-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
assertion failed, "self._bus is None", bluez backend on Linux #641
Comments
Great analysis of the issue, thanks for that.
I'm guessing that #621 didn't fix the issue then? Looking at the bigger picture, I've been thinking that we should probably put locks on all methods that interact with hardware. This would prevent re-entrancy issues for users with parallel tasks as well (e.g. trying to connect to a device at the same time in two separate tasks). I think these sorts of issues are a problem on all backends, not just BlueZ. |
When the following sequence of events occured, `assert self._bus is None` would be hit. - A "Connected": False property change signal is sent over D-Bus. - Bleak handles the event and calls the `_cleanup_all()` method. - The `_cleanup_all()` sends a D-Bus message to unsubscribe from signals and awaits it. - While the previous method is still awaiting, the disconnect method is called. - This method hits an assertation because the connected property is false but there is still a D-Bus bus connection since `_cleanup_all()` is still using it. This fixes the issue by ensuring that we are not in the middle of `_cleanup_all()` before proceeding with the `disconnect()` method. Fixes #641.
When the following sequence of events occured, `assert self._bus is None` would be hit. - A "Connected": False property change signal is sent over D-Bus. - Bleak handles the event and calls the `_cleanup_all()` method. - The `_cleanup_all()` sends a D-Bus message to unsubscribe from signals and awaits it. - While the previous method is still awaiting, the disconnect method is called. - This method hits an assertation because the connected property is false but there is still a D-Bus bus connection since `_cleanup_all()` is still using it. This fixes the issue by making `_cleanup_all()` a syncronous method. Since we are closing the bus connection, we should not need to remove signal watchers first. Fixes #641.
Hi @egnor. I've make a couple of pull requests with possible solutions. Could you please run your stress test on them? |
Alas, as described in #631, this particular project moved to using a different stack, so it's not trivial for me to test back with bleak. I could dig into my git version history, but... I also just shipped off all my devices to a field deployment. (I'll get them back in a few weeks.) So probably just assume it's fixed until you see more complaints (from me or anyone else)? I think more synchronization here is absolutely a good move for all the reasons you mention. (Oh and yes, #621 hadn't helped -- I'd moved to head specifically to try that.) Thanks for all your work on this!! 🙏 |
When the following sequence of events occured, `assert self._bus is None` would be hit. - A "Connected": False property change signal is sent over D-Bus. - Bleak handles the event and calls the `_cleanup_all()` method. - The `_cleanup_all()` sends a D-Bus message to unsubscribe from signals and awaits it. - While the previous method is still awaiting, the disconnect method is called. - This method hits an assertation because the connected property is false but there is still a D-Bus bus connection since `_cleanup_all()` is still using it. This fixes the issue by making `_cleanup_all()` a syncronous method. Since we are closing the bus connection, we should not need to remove signal watchers first. Fixes #641.
When the following sequence of events occured, `assert self._bus is None` would be hit. - A "Connected": False property change signal is sent over D-Bus. - Bleak handles the event and calls the `_cleanup_all()` method. - The `_cleanup_all()` sends a D-Bus message to unsubscribe from signals and awaits it. - While the previous method is still awaiting, the disconnect method is called. - This method hits an assertation because the connected property is false but there is still a D-Bus bus connection since `_cleanup_all()` is still using it. This fixes the issue by making `_cleanup_all()` a syncronous method. Since we are closing the bus connection, we should not need to remove signal watchers first. Fixes #641.
When the following sequence of events occured, `assert self._bus is None` would be hit. - A "Connected": False property change signal is sent over D-Bus. - Bleak handles the event and calls the `_cleanup_all()` method. - The `_cleanup_all()` sends a D-Bus message to unsubscribe from signals and awaits it. - While the previous method is still awaiting, the disconnect method is called. - This method hits an assertation because the connected property is false but there is still a D-Bus bus connection since `_cleanup_all()` is still using it. This fixes the issue by making `_cleanup_all()` a syncronous method. Since we are closing the bus connection, we should not need to remove signal watchers first. Fixes #641.
When the following sequence of events occured, `assert self._bus is None` would be hit. - A "Connected": False property change signal is sent over D-Bus. - Bleak handles the event and calls the `_cleanup_all()` method. - The `_cleanup_all()` sends a D-Bus message to unsubscribe from signals and awaits it. - While the previous method is still awaiting, the disconnect method is called. - This method hits an assertation because the connected property is false but there is still a D-Bus bus connection since `_cleanup_all()` is still using it. This fixes the issue by making `_cleanup_all()` a syncronous method. Since we are closing the bus connection, we should not need to remove signal watchers first. Fixes #641.
When the following sequence of events occured, `assert self._bus is None` would be hit. - A "Connected": False property change signal is sent over D-Bus. - Bleak handles the event and calls the `_cleanup_all()` method. - The `_cleanup_all()` sends a D-Bus message to unsubscribe from signals and awaits it. - While the previous method is still awaiting, the disconnect method is called. - This method hits an assertation because the connected property is false but there is still a D-Bus bus connection since `_cleanup_all()` is still using it. This fixes the issue by making `_cleanup_all()` a syncronous method. Since we are closing the bus connection, we should not need to remove signal watchers first. Fixes #641.
Added ----- * Allow 16-bit UUID string arguments to ``get_service()`` and ``get_characteristic()``. * Added ``register_uuids()`` to augment the uuid-to-description mapping. * Added support for Python 3.10. * Added ``force_indicate`` keyword argument for WinRT backend client's ``start_notify`` method. Fixes #526. * Added python-for-android backend. Changed ------- * Changed from ``winrt`` dependency to ``bleak-winrt``. * Improved error when connecting to device fails in WinRT backend. Fixes #647. * Changed examples to use ``asyncio.run()``. * Changed the default notify method for the WinRT backend from ``Indicate`` to ``Notify``. * Refactored GATT error handling in WinRT backend. * Changed Windows Bluetooth packet capture instructions. Fixes #653. * Replaced usage of deprecated ``@abc.abstractproperty``. * Use ``asyncio.get_running_loop()`` instead of ``asyncio.get_event_loop()``. Removed ------- * Removed ``dotnet`` backend. * Dropped support for Python 3.6. * Removed ``use_cached`` kwarg from ``BleakClient`` ``connect()`` and ``get_services()`` methods. Fixes #646. Fixed ----- * Fixed unused timeout in the implementation of BleakScanner's ``find_device_by_address()`` function. * Fixed BleakClient ignoring the ``adapter`` kwarg. Fixes #607. * Fixed writing descriptors in WinRT backend. Fixes #615. * Fixed race on disconnect and cleanup of BlueZ matches when device disconnects early. Fixes #603. * Fixed memory leaks on Windows. * Fixed protocol error code descriptions on WinRT backend. Fixes #532. * Fixed race condition hitting assentation in BlueZ ``disconnect()`` method. Fixes #641. * Fixed enumerating services on a device with HID service on WinRT backend. Fixes #599. * Fixed subprocess running to check BlueZ version each time a client is created. Fixes #602. * Fixed exception when discovering services after reconnecting in CoreBluetooth backend.
This issue is closed, so I would suggest starting a new issue with a reproducible test case. |
Thank you for the fast response. Can you elaborate on the test case? As mentioned before, I'm not a Python developer, so I can't contribute any code which would make any sense. So I can reproduce the issue by using a device that I own with the tool that uses bleak. |
If you can reproduce the error with one of the example Bleak scripts or if you can create a minimum version of the code from the tool to reproduce the issue, that would be useful. Also, any logs of error stack traces, debug messages and Bluetooth packet could be useful (see troubleshooting page in the docs). |
bluetoothctl -v
) in case of Linux: 5.61Description
Running a stress test that consists of repeated discovery/connect/transmit cycles, after a couple hours I see this message, which seems to happen when closing a connection (BleakClient's
__aexit__
):What I Did
I'm talking to a BTLE device (a cheap digital nametag). Unfortunately this is not reproducible on demand, though it usually crops up (and given that it's an assertion error is fairly damaging).
Discussion
Here is the code in question with some commentary added:
Since this function returns early if
self._bus
is already None, if the "if self._disconnecting_event:
" and "elif self.is_connected
" conditions both fall through, the assertion is guaranteed to fail!Specifically, this could happen with a server side disconnection, with a race between
self._properties["Connected"]
being set Falsedisconnect()
_cleanup_all()
task spawned by the property change handlerThis would all be a lot more bulletproof if
_cleanup_all()
were synchronous so there was no intermediate state withis_connected
false but_bus
still non-None. This appears to be there to support_remove_signal_handlers
which requires an async call toremove_match
.Assuming that all has to be that way, I'd propose
_disconnecting_event
is triggered (and removed) exactly whenself._bus
is set to None_bus_removal_event
to drive that point homedisconnect()
to check for the event (and pile on it), then check forself._bus
instead of checkingis_connected
?The text was updated successfully, but these errors were encountered: