Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to connect to overlay network after hibernation or changing the docking status on macos #2196

Open
christian-schlichtherle opened this issue Jun 25, 2024 · 27 comments
Labels

Comments

@christian-schlichtherle

Describe the problem

After waking up my MacBook Pro from hibernation, Netbird fails to connect to the overlay network again:

$ sudo netbird status
Error: status failed: create wg interface: resource busy
$ sudo ifconfig utun100
utun100: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 1280
	inet 100.90.208.249 --> 100.90.208.249 netmask 0xff000000
	inet6 fe80::f22f:4bff:fe13:efad%utun100 prefixlen 64 scopeid 0x20 
	inet6 fe80::%utun100 prefixlen 64 scopeid 0x20 
	nd6 options=201<PERFORMNUD,DAD>
$ sudo netbird up
Connected
$ sudo netbird status
Error: status failed: create wg interface: resource busy
$ sudo ifconfig utun100
utun100: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 1280
	inet 100.90.208.249 --> 100.90.208.249 netmask 0xff000000
	inet6 fe80::f22f:4bff:fe13:efad%utun100 prefixlen 64 scopeid 0x20 
	inet6 fe80::%utun100 prefixlen 64 scopeid 0x20 
	nd6 options=201<PERFORMNUD,DAD>

I have Netbird UI 0.28.2 installed. My machine has multiple network interfaces: When undocked, it's only connected to my WiFi. When being docked, it's also connected via a Thunderbolt Ethernet adapter with 10Gbps, so when docking/undocking my notebook the OS is actually roaming. This may or may not be related to the problem.

I can do netbird down, netbird up, but it doesn't reconnect again. Wireguard stays unconnected:

$ sudo wg show
interface: utun100
  public key: (not shown)
  private key: (hidden)
  listening port: 51820

The only way to reconnect is to reboot the OS, which is very annoying. Is there another workaround at least?

Expected behavior

Netbird should automatically reconnect to the overlay network after waking up the machine from hibernation or undocking/docking it.

Are you using NetBird Cloud?

Yes.

NetBird version

0.28.2

NetBird status -d output:

$ sudo netbird status -d
Error: status failed: create wg interface: resource busy
@mlsmaycon
Copy link
Collaborator

@christian-schlichtherle can you upgrade to 0.28.3 and enable the network monitor with:

netbird down
netbird up -N

@bcmmbaga bcmmbaga added bug Something isn't working client networking macos and removed triage-needed labels Jun 25, 2024
@hurricanehrndz
Copy link
Contributor

This is related to #2130

@christian-schlichtherle
Copy link
Author

After upgrading to 0.28.3, a first test seems to be successful: After wakeup from hibernation for some minutes, the Netbird client reconnects to the other peers one by one:

$ netbird status
OS: darwin/arm64
Daemon version: 0.28.3
CLI version: 0.28.3
Management: Connected
Signal: Connected
Relays: 2/2 Available
Nameservers: 0/0 Available
FQDN: (not shown)
NetBird IP: 100.90.208.249/16
Interface type: Userspace
Quantum resistance: false
Routes: -
Peers count: 10/10 Connected

I still have to test the docking/undocking scenario, so please don't close this ticket yet.

@hurricanehrndz
Copy link
Contributor

Yeah it is because of multiple interfaces, I am going to get a PR ready, hopefully Netbird team can fix it and make it pretty for their standards

@hurricanehrndz
Copy link
Contributor

@christian-schlichtherle feel free to test out the branch if you know how

@mlsmaycon
Copy link
Collaborator

@hurricanehrndz thanks for the PR, we will have a look and give you feedback ASAP.

@christian-schlichtherle If you want to test the PR change, you can download the files from here: https://github.com/netbirdio/netbird/actions/runs/9667635878/artifacts/1637361966 And replace the netbird bin in your system, probably /Applications/NetBird.app/Contents/MacOS/netbird with the one from the package. See the example steps below:

sudo netbird service stop
sudo cp extracted/bin/path/netbird /Applications/NetBird.app/Contents/MacOS/netbird
sudo chmod +x /Applications/NetBird.app/Contents/MacOS/netbird
sudo netbird service start

@christian-schlichtherle
Copy link
Author

christian-schlichtherle commented Jun 26, 2024

Following up my testing, after docking my notebook with 0.28.3 installed I run into the same problem again:

$ netbird status
Error: status failed: create wg interface: resource busy

Next, I will try the supplied patch.

PS: Same result when waking up from hibernation while being docked => The root cause is related to multiple NICs.

@christian-schlichtherle
Copy link
Author

I've installed the new client:

$ netbird status
OS: darwin/arm64
Daemon version: 0.28.3-SNAPSHOT-2c869542
CLI version: 0.28.3-SNAPSHOT-2c869542
Management: Connected
Signal: Connected
Relays: 2/2 Available
Nameservers: 0/0 Available
FQDN: (not shown)
NetBird IP: 100.90.208.249/16
Interface type: Userspace
Quantum resistance: false
Routes: -
Peers count: 10/10 Connected

BTW: Following semantic versioning, the version tag should be 0.28.4-SNAPSHOT-2c869542 because 0.28.3-SNAPSHOT-2c869542 implies that it's a pre-release to 0.28.3, but it's not. Maybe, just maybe that's why the Netbird UI is now suggesting to "Download the latest version"?

@christian-schlichtherle
Copy link
Author

christian-schlichtherle commented Jun 26, 2024

I've completed the test series now: With the new snapshot version, I can dock/undock/hibernate my notebook in any fashion and it reconnects seamlessly - great!

I noticed something interesting however: When doing netbird status instantly after waking up from hibernation, it shows that there are still some connections available, e.g. 6/10. Then after the three seconds elapsed, it drops down to 0/10 and then ramps up to 10/10 again. This begs the question if the connection reset is really required after all?

As I understand this I can suppress it:

netbird down
netbird up --network-monitor=false

I will give that a try.

@hurricanehrndz
Copy link
Contributor

Network monitor is beneficial when you go from dock to wifi, because the primary route would change. You can test the negative behaviour by sshing to a device on the wireguard interface.

@fadyHemaya
Copy link

I am still facing this issue on 0.28.4

2024-07-28T11:15:52+03:00 INFO client/internal/routemanager/manager.go:133: Routing setup complete
2024-07-28T11:15:52+03:00 ERRO client/internal/engine.go:332: failed creating tunnel interface utun100: [resource busy]
2024-07-28T11:15:52+03:00 INFO client/internal/routemanager/manager.go:168: Routing cleanup complete
2024-07-28T11:15:52+03:00 DEBG client/internal/engine.go:1236: removing Netbird interface utun100
2024-07-28T11:15:52+03:00 ERRO client/internal/connect.go:263: error while starting Netbird Connection Engine: create wg interface: resource busy

@hurricanehrndz
Copy link
Contributor

@fadyHemaya I can try and tweak the current code to see if we can get a build with better results

@pascal-fischer
Copy link
Contributor

Hi @fadyHemaya, we introduced a waiting mechanism into the netbird down command to wait until all the processes stopped and the wireguard interface is being removed. This change was released in version 0.28.5. Could you upgrade to a newer version and see if you are still facing this issue?

@MxD-js
Copy link

MxD-js commented Aug 13, 2024

I'm on netbird 0.28.7 on m1 mac and issue is still present.

(base) user@Maxs-M1-Pro-Max ~ % netbird status
OS: darwin/arm64
Daemon version: 0.28.7
CLI version: 0.28.7
Management: Disconnected, reason: create wg interface: resource busy
Signal: Disconnected, reason: create wg interface: resource busy
Relays: 0/0 Available
Nameservers: 0/0 Available
FQDN: 
NetBird IP: N/A
Interface type: N/A
Quantum resistance: false
Routes: -
Peers count: 0/0 Connected
(base) user@Maxs-M1-Pro-Max ~ % 

@1doce8
Copy link

1doce8 commented Aug 16, 2024

+1 netbird 0.28.7 on m1 mac

@axuan25
Copy link

axuan25 commented Aug 28, 2024

Same here with 28.8 🙏

@rsalunga29
Copy link

Still facing the issue with v0.29.4 running on M3 Pro and Sonoma. The issue happens when I connect-disconnect a few times and only gets fixed whenever I restart my device. See logs:

2024-09-27T10:40:03+08:00 INFO client/internal/routemanager/manager.go:142: Routing setup complete
2024-09-27T10:40:03+08:00 ERRO client/internal/engine.go:334: failed creating tunnel interface utun100: [error creating tun device: resource busy]
2024-09-27T10:40:03+08:00 INFO client/internal/routemanager/manager.go:177: Routing cleanup complete
2024-09-27T10:40:03+08:00 DEBG client/internal/engine.go:1124: removing Netbird interface utun100
2024-09-27T10:40:08+08:00 INFO [relay: rel://netbird.anonymizeddomain.com:33080] relay/client/client.go:330: start to Relay read loop exit
2024-09-27T10:40:08+08:00 DEBG [relay: rel://netbird.anonymizeddomain.com:33080] relay/client/client.go:333: failed to read message from relay server: failed to get reader: failed to read frame header: EOF
2024-09-27T10:40:08+08:00 INFO [relay: rel://netbird.anonymizeddomain.com:33080] relay/client/client.go:521: closing all peer connections
2024-09-27T10:40:08+08:00 INFO [relay: rel://netbird.anonymizeddomain.com:33080] relay/client/client.go:529: waiting for read loop to close
2024-09-27T10:40:08+08:00 INFO [relay: rel://netbird.anonymizeddomain.com:33080] relay/client/client.go:531: relay connection closed
2024-09-27T10:40:08+08:00 WARN iface/iface.go:135: failed to remove WireGuard interface utun100: timeout when waiting for interface utun100 to be removed
2024-09-27T10:40:08+08:00 ERRO client/internal/engine.go:1127: failed closing Netbird interface utun100 failed to remove WireGuard interface utun100: failed to remove interface utun100: exit status 1 - ifconfig: SIOCIFDESTROY: Invalid argument
2024-09-27T10:40:08+08:00 ERRO client/internal/connect.go:282: error while starting Netbird Connection Engine: create wg interface: error creating tun device: resource busy

@hurricanehrndz
Copy link
Contributor

@rsalunga29 can you verify that autoconnect and networkmonitor are both set to true in the config

When the issue occurs can you also share the output of the following

sudo lsof /var/run/wireguard/utun100.sock
ps aux | grep -i netbird | grep -v grep

@asvataa
Copy link

asvataa commented Sep 29, 2024

+10 users with same error on versions 0.28.* 0.29.*

@christian-schlichtherle
Copy link
Author

It used to work just fine with v0.28.4, but somewhere between this and v0.29.4 it stopped working. Looks like a regression to me.

@hurricanehrndz
Copy link
Contributor

Can we please get more information/logs, without the added information we really are guessing to what is going on.

@hurricanehrndz
Copy link
Contributor

@asvataa
Copy link

asvataa commented Oct 2, 2024

@hurricanehrndz maybe I can help with it? Could you pls describe scenario of test?

@hurricanehrndz
Copy link
Contributor

hurricanehrndz commented Oct 3, 2024

@hurricanehrndz maybe I can help with it? Could you pls describe scenario of test?

In the link above you will see the artifacts for the main binary and the Ui replace the ones installed in the app bundle with the new ones, reboot

@christian-schlichtherle
Copy link
Author

I'm sorry I'm late to this conversation. We have upgraded to 0.30.0 today. Let me see if the problem surfaces again.

@hurricanehrndz
Copy link
Contributor

Please do, as the PR was merged.

@christian-schlichtherle
Copy link
Author

So far it works, but it's only one day later. ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests