-
Notifications
You must be signed in to change notification settings - Fork 325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Broken DNS over WAN / missing dnsmasq after quick link downs+ups #2950
Comments
We are seeing this very same behaviour with our nodes too. They are running on Gluon v2022.1.2 and Gluon v2022.1.3 with Tunneldigger L2TP Mesh-VPN enabled. What is even worse: In our testing Gluon v2023.1 is running from this state even on a fresh boot. |
Can you test if #2969 mitigates the issue you are seeing? |
A little update and more background info: The issue was caused due to power/undervoltage issues of a Mikrotik RB260GSP switch (DC-in range: 11V-30V), which caused it to reboot frequently whenever a cooling box's compressor on the same 12V power supply started. The RB260GSP is directly connected to the Plasmacloud PA2200 running Gluon with two LAN cables. The RB260GSP is now on a separate 24V power supply. But I can still reproduce the Gluon issue by just disconnecting the LAN cable to the PA2200's WAN port. After about 1-5 minutes dnsmasq reproducibly segfaults:
Original content of /var/gluon/wan-dnsmasq/resolv.conf (before disconnecting the LAN cable, becomes a 0 bytes file right after disconnecting):
I also tried to reproduce the issue from an x86_64 qemu KVM instance. I can set the WAN port's carrier to NO-CARRIER via qemu monitor's set_link command. However dnsmasq does not seem to crash in there. (Maybe it's necessary to have a DHCP server connected to the WAN side to hand out IP addresses, routes and DNS servers, which I didn't have in my qemu tests yet.) @blocktrron #2969 seems to indeed mitigate the issue, procd restarts dnsmasq successfully then after dnsmasq's segfault. |
Also here's a core dump of dnsmasq from this device after the segfault. Though probably quite useless with the missing debug symbols etc. (at least I can't get anything useful out of it with gdb right now and it does not want to generate a backtrace for me): https://speicher.hamburg.freifunk.net/d/b6c6c1af9fb341cfbc32/ |
@T-X Can you try to generate a backtrace with a build for which you have the unstripped dnsmasq and libc binaries? |
@T-X good to know, although it is not a proper fix by all means |
Here's a core dump made with the unstripped dnsmasq (and hopefully with the unstripped glibc): https://speicher.hamburg.freifunk.net/d/92a69e30cc0d478b8a3a/ However, I'm still unsuccessful to get a backtrace from it via gdb (but at least it's not complaining about missing register info anymore):
I had copied Edit: libc.so was stripped. This should be with an unstripped one. Still same issues with getting a backtrace from gdb: https://speicher.hamburg.freifunk.net/d/de1e7a79a92c49b8acf6/ |
The files you've provided produce this backtrace for me:
|
Moving this to the next milestone, as a workaround by using procd has been implemented. |
Bug report
What is the problem?
What is the expected behaviour?
Gluon Version:
Site Configuration:
Custom patches:
update1:
The text was updated successfully, but these errors were encountered: