-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Program gets stuck in reader.stop_reading() #110
Comments
I'm experiencing a similar situation on Windows; program hangs sometimes when stop_reading() is called. The reader stops, but the program never returns. It's easy to make happen when doing a repeated sequence of start_reading(), delay for a few seconds, and then stop_reading(). I built python-mercury module from source using Mercury API 1.35. |
I ran into this issue as well. After much work with gdb, I discovered the issue and implemented a fix (see the PR I just opened). Here is my commit message explaining the isse:
|
Very nice ! We've been waiting for a fix for this for a long time, will try it out, hope this gets merged ! |
Thanks a lot for your effort, @charlescochran! @natcl, please let me know whether the fix works for you as well. I will merge it then. |
Thanks, it might take a little while before we can test but I'll let you know ! |
@charlescochran I tried your pull request and I'm still getting a freeze on stopReading after a couple of minutes of calling it in a loop (every 4 seconds) |
Hmm. I have been regularly testing it over long periods of time and it has been not freezing. Perhaps you are encountering a separate bug. Can you run your program using |
Thanks will do ! |
|
I think I might have something missing to get the full stack ? |
Ah I installed ionotify and it seems better, new log coming soon |
That seems better:
|
Interesting. Please run |
Oh you want to verify if I'm running the correct code ? |
I want to verify that, and also that you have the other latest commits on master. There have been 6 commits since the latest release. I think I remember the latest release having bugs that were fixed by them. My branch is on the latest master rather than the latest release, so as long as you are not cherry-picking it, you should also have those bug fixes as well (assuming you are running my branch, of course). |
That looks correct, in the sense that that is all of my changes. Can you verify that you have these commits, as well? |
I do have those commits as well. I merged your branch but on a more recent local version (that has the get_read_state() method from this PR: #117) We're only calling stopReading once, the reason it says "Reader was already stopped" is that if the reader is started when we call initializeReader, we stop it, but in this case it's not called. |
I checked my program again and we are calling other methods before |
I may have spoke too fast, just after hitting send it crashed again, I wasn't running gdb this time so will run it again with gdb. |
Gotcha. I have never seen it get stuck in start_reading, so I did not attempt to fix that. From my experience, the deadlock in stop_reading happens quite quickly (after a few seconds/minutes), especially if you increase the rate of starting and stopping (try every 0.25 seconds instead of every 4). Perhaps you could test again without my commit and see if it deadlocks quickly in stop_reading. Then test with my fix, and hopefully it will only crash occasionally in start_reading. |
Here's the stack I got, it seems to still be in stop_reading:
|
I tried official build and it indeed crashes sooner (with the same stack trace as above) |
Interesting. If I run into a crash with the new code, I will let you know. |
Also, if I may ask, what is your reason for starting/stopping the radio on a regular interval? I know this should be fixable, but one reason for wanting to do this is to allow periodic calls to get_temperature(). After my change, this should be no longer be necessary, as the stats callback now works for all asynchronous reads, not just the first one. |
On our side it's because we need to write to the tag also so before writing we need to stop reading. |
By the way, on our side we're getting the temperature by enabling stats (the reader.enable_stats method). |
With the original code (without my code), does stats work for you after the first time you restarting reading? For me, with the original code, the stats callbacks would simply stop getting called after restarting the reading for the first time. For me, my code fixes this. |
It does seem to work on our side, |
Yep |
Ok that's strange, never had that issue here. |
@natcl I may have found the same deadlock you are experiencing with my code. This deadlock is directly related to an exception called "Buffer overflow", which occurs when asynchronous tag callbacks are not processing fast enough, leading to an excess (100+) of tag reads waiting to be processed. If you don't mind, try enabling exception handling (if you haven't already) with |
By the way, you can easily trigger a "Buffer overflow" by adding a sleep (0.1 seconds should do it; 0.5 definitely will) to your read callback function. |
Thanks, will give it a try soon ! |
@natcl @gotthardp |
@natcl @gotthardp |
@charlescochran thanks for the detailed testing ! Thanks ! |
Yes, I've found that timeout errors can occur for other reasons, for example in reader initialization. A buffer overflow seems to cause a timeout error (in v1.31.4.35, anyway), but other things can cause them too. I believe that the timeout error itself is not causing the deadlock. |
All right, fix for the local issue (but not the upstream issue) has been merged to master. If you still encounter this bug, I recommend trying the latest master rather than the latest tag release. |
I just ran into this, I think for the first time (I'm on my fourth project using Jadak's hardware). Using MercuryAPI 1.31.4.35 and python-mercurapi 0.5.4 (without the recent fixes). Both I solved it by running my readers as separate threads (I'm using two M6E Micro for this project) with all the heavy lifting in the main Python application thread. Seems to work ok, and the refactoring was an opportunity to improve the structure of my project, but I'm still interested to hear if/when there's an offical upstream fix. Big thanks to everyone involved in this project! Edit: It seems I spoke too soon; the overflow still happens, only less rarely now that the buffer is being emptied faster. But given enough visible tags (a dozen will do) and enough time (a minute or so), the buffer will still fill up. I don't see any way I could possibly make the buffer clear faster, now that the readers have their own threads. It looks like I'll have to pull latest and rebuild, in the hope that 325efef will mitigate the issue. |
I downloaded MercuryAPI 1.35.2.72 and used it to build a pull of latest master (including 325efef), and I'm happy to report that this seems to have fixed the issue; no hangs experienced on either |
@clickworkorange |
@charlescochran This whole thread was very helpful - one of those rare cases where there is an exact and detailed bug report of the issue I'm having, including a solution. But I have to admit I did not read your comments re. 1.35.2.72; I just went with it and it seems to work without any modifications. My modules run ancient firmware though (01.09.01), which might have something to do with it. I don't have access to a Win64 machine so can't run Universal Reader Assistant (on my Win10 VM it sees the modules but cannot connect) to update the firmware. |
@charlescochran Ha! I think I spoke too soon again - I'm now getting extremely weird segfaults and even a kernel panic. It seems libc gets it's knickers in a twist:
This foe is beyond me, I fear. |
@clickworkorange Yikes! I'm not quite sure if this related, but as mentioned in that same comment of mine above, there is still an upstream buffer overflow issue that I wasn't able to solve in this Python wrapper. I told Jadak about and they said they were able to reproduce it, so I expect them to release a fix for it eventually, but until that happens, I couldn't figure out a way for this library to be deadlock-free! You seemed to be experiencing the buffer overflow issue earlier, which wasn't surprising to me. If this new stuff is something different, that's unfortunate. You could also try changing the preprocessor macro from 0 to 1, as mentioned above, to see if that changes things. It seems to affect a lot. Old firmware could also play a role; I feel your pain having to have access to Windows! |
I finally managed to catch a segfault while running my Python app under
Does this provide any clues? |
|
Corrected broken link of ZIP file from jadaktech.com, to version AHAB
I'm testing the program by repeating reader.start_reading() and reader.stop_reading(). At some point (few minutes or few hours), reader.stop_reading() will not return resulting in the program getting stuck. I was unable to replicate the problem with a pure C implementation, so, there is something going on with the way python threads is mixed with c threads. Using gdb I was able to confirm that one of the c threads gets stuck
The text was updated successfully, but these errors were encountered: