-
-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vcos_semaphore_wait_timeout() and RTC adjusting #56
Comments
Actually, this is a fatal error that should not occur under normal conditions. If this happens, it means something is going very wrong. There may be something wrong with your kernel or GPU configuration. Show me |
Sure thing. It's pretty much just stock Raspberry Pi OS with the tc358473 and dwc modules loaded.
|
Add The name EAGAIN may be misleading. This may occur when waiting for a semaphore that indicates the end of encoding. Three seconds to encode a single image is a very, very long time. Therefore, reaching the timeout means that the hardware has entered a strange state and is unusable. Further decisions should be made by the user. So I would prefer to leave this logic unchanged. PS: plz combine PRs about typos into one. |
Will do, thanks!
Fair enough.
No problem. Done: #57 |
Sorry, I'm actually still seeing this even with the changes to https://gist.github.com/mtlynch/31e4c000767ed25b45c62bea6d3c62c0 Here's just the failing worker:
I'm confused by the timing. According to the time of day time, it's 13 seconds between the start of compression and |
Okay, let's get this straight. There are no miracles, it can't be that everything works for me, but for you it breaks.
|
Sorry, I didn't notice your message about the time in my sleep. I finally woke up and realized the problem. What if the system clock is weird? Are you running ntp? |
I wonder if there is a way to check if these values are active in the OS vs looking at the boot files on the file system? Time to Google that. |
The ustreamer clock is a monotonically increasing kernel timer. I trust this value a little more than the rest of the numbers. I think I understand. For some reason, your OS changes the system clock. Here the code. In turn In my opinion, using these functions are broken by design, but it is unlikely that anyone will change this so as not to break the existing code. In our case, checking after EAGAIN that the required amount of time has actually passed would help. But this will not save you from the situation if your clock is moved to the past. For temporary fix uncomment this line and comment the whole In the meantime, I'll figure out how best to fix it. |
If the system clock changes significantly while uStreamer is running with the OMX encoder, it breaks the encoder because it thinks a 3 second semaphore timeout has occurred when it's actually just a time jump from network time adjustment. See: pikvm/ustreamer#56 (comment) This fix adds a time-sync.target dependency to the uStreamer service to ensure that uStreamer doesn't start until the system has completed its network time sync during boot.
Thanks for digging into this @mdevaev! I'm surprised this isn't more common. Shouldn't it occur any time the Pi adjusts its time by more than 3s when uStreamer is running with OMX encoding? I run uStreamer as a systemd service that auto-runs at boot, so if others are starting uStreamer later after boot, then maybe it's more rare for significant time adjustments to happen while uStreamer is running. One way to consistently repro:
The Pi's clock will be behind real world time because it fell behind while the device was powered off. When the Pi regains Internet access and updates its clock, it will hit the I considered fixing this on my end by adding a dependency on time-sync.target, but that feels pretty kludgy and still doesn't mitigate the error in scenarios where the device can't access NTP at boot time but regains access after uStreamer is already running.
That does work, thanks!
Please let me know if I can help out in this investigation or contributing code to the fix.
@LeeNX - I'm not sure how to check the cma setting at runtime, but you can verify the GPU setting like this: $ sudo vcgencmd get_mem gpu
gpu=256M |
@mtlynch It's hard to say what this is about. Maybe your NTP server periodically lies and gives the wrong time, and then fixes it. Perhaps the Pi clock is somehow lagging behind. Time problems are quite common, but this is usually taken into account. Like everyone knows that I made another fix that slightly increases the load on the processor (just a couple of percent), but saves the use of timeout. Try it please: https://github.com/pikvm/ustreamer/tree/semwait |
Since the Pi has no RTC, isn't it expected that it always lags behind after being powered off? It has no way of knowing how much time has passed while it was powered down until it syncs with NTP again.
Fix worked for me. I repro'ed the delayed access to NTP method, and uStreamer produced no errors and continued encoding with OMX. CPU load remains ~14-17% on Pi 4B 2GB. |
Well, yes, that's right. In your case, NTP is started at a later date. Pi-KVM is designed differently, so it is not affected by this problem when booting.
Nice. Before making a final decision, I will wait for a response from the RPi developers. They may want to make a fix to their code. |
Although I have never encountered a hang on this semaphore. There are three possible reasons: incorrect code, empty input data, or a hardware error. The first is already debugged, the second I filter, and the third I haven't seen yet, even considering the entire user base. Probably a good enough solution would be to use the function without timeout (infinite waiting). Anyway I am waiting for a response from RPi's devs. PS: Many, many thanks :) |
No response has been received. I will enable the default implementation without timeout. The timeout can be enabled by a flag during compilation. |
Thanks! I bumped the bug to see if adding extra context about repro steps might help. |
I noticed that when I use OMX encoding, uStreamer switches to CPU encoding on any OMX encoding failure:
Relevant code is here:
https://github.com/pikvm/ustreamer/blob/d9b91a1d5f9f7cad8aaf031e6ef5cda59c52a431/src/encoder.c#L218L239
The permanent switch to CPU surprised me, since
EAGAIN
seems like only a transient error. I tried patching the code to make the CPU encoding temporary (just for the current buffer), and it seems to work as expected.Would you be open to making uStreamer wait longer before deciding to switch permanently to CPU-only encoding?
Proposed solutions
A. Only temporarily fall back to CPU
The simplest solution is to make the fallbacks always temporary. I put together a quick implementation of this solution:
https://github.com/pikvm/ustreamer/compare/master...mtlynch:only-fallback-temporarily?expand=1
B. Fall back permanently after N consecutive failed attempts
A more advanced version of (A) would be to maintain a count of successful buffer encodings and only fail over to CPU if there have been N consecutive failed encodings with OMX.
C. Fall back temporarily after EAGAIN, permanently after EINVAL and other errors
The other way to come at it would be to distinguish between fatal and non-fatal errors from the OMX encoder. We could treat EAGAIN as non-fatal (fall back to CPU just for this buffer) and errors like EINVAL as fatal (permanently switch over to CPU).
The text was updated successfully, but these errors were encountered: