-
Notifications
You must be signed in to change notification settings - Fork 2
Troubleshooting
If Runtime doesn't start up or exit properly, check to make sure there are no residual objects from the previous run of Runtime. After cleaning up any residual objects from the previous run of Runtime, start up Runtime by either starting up all the services again:
service shm_start start
service net_handler start
service dev_handler start
service executor start
or by using the scripts/run.sh
shell script with ./runtime run
in the terminal from any directory.
If running Runtime manually (i.e. using ./runtime run
), try to stop it using ./runtime stop
first. Then, check for residual processes by running the command ./runtime status
, which just runs the following command:
ps -efH -u ubuntu
This command lists all processes owned by the ubuntu
user on the Raspberry Pi. If you see anything Runtime-related, for example:
ubuntu 23623 1 0 11:41 pts/1 00:00:00 ./net_handler
kill the process by sending it SIGINT with kill -INT 23623
where you replace the 23623
with whatever the process ID of the residual process is.
If running Runtime using the systemd
services, shutting down one of the main systemd
services (dev_handler
seems to be the most reliable) should stop all of Runtime:
sudo systemctl stop dev_handler
You can do a check on whether there are residual processes and kill them in the same manner as described above.
Check that the challenge socket and the log FIFO are reset by removing them if they exist. Do:
ls /tmp
and if you see log-fifo
in the output, remove them it rm /tmp/log-fifo
, respectively.
If you are running tests, there might also be a bunch of unused virtual device sockets of in the /var
directory that look like ttyACM*
, where *
represents some number. Remove all of those too, with rm /var/ttyACM*
.
Check that the shared memory is removed if it exists. Do:
ls /dev/shm
and if you see a whole bunch of files there, remove them with rm /dev/shm/*
.
The first step to debugging a seemingly working Runtime is always to get into the robot (i.e. get an ssh
session running between your computer and the robot). Without a way to see the state of Runtime through an ssh
session, Runtime is basically impossible to debug. From there, generally a good next step is to open the shared memory UI to get a real-time view of the shared memory of Runtime. After that, there aren't many more general rules that you can follow to diagnose the issue. We will simply write a laundry list of common commands that can be run to help diagnose the issue.
If you are sure you know that the Raspberry Pi is connected to a certain network, (for example, if raspberrypi
is visible on the network router's admin page, along with its IP address), then the first step is to connect your computer to the network (try wirelessly at first). Test if you can ping
the Raspberry Pi in question by running
ping <IP address>
or
ping <hostname>.local
on your computer. You should get a 100% success rate of pings. If you get Received timeout seq=<x>
repeatedly, then either the Raspberry Pi is not on the network, your computer is not on the network, or the Raspberry Pi's ping service isn't working. Check to make sure that your computer is on the network, or get someone else to connect to the network and ping the Raspberry Pi to make sure your computer isn't the problem. If nobody can ping it, try to connect via Ethernet. If the ping is successful, try to ssh
into the Raspberry Pi using the command
ssh ubuntu@<IP address>
or
ssh ubuntu@<hostname>.local
To connect to the Raspberry Pi via Ethernet, first connect the network router to a network switch via an Ethernet cable. Then, connect the Ethernet port of the Raspberry Pi to one of the ports on the network switch with another Ethernet cable. Lastly, connect your computer to the network switch with a third Ethernet cable. You may want to turn off the Wi-fi on your computer to ensure that your computer is using the Ethernet connection for all networking, as having two networking interfaces active on a computer can sometimes cause issues. After you do this, try checking the network's admin page. If all is well, you should see both your computer and the Raspberry Pi on the network, along with their associated IP address. Try to ping the device; if all is well, then try to ssh
into the device using the same command as mentioned above.
If contact cannot be established between the Raspberry Pi and any computer, wired or wirelessly, unfortunately the Raspberry Pi or SD card (or both) may be bricked, in which case a new SD card will need to be flashed, or the old SD card be transferred to new Raspberry Pi). Try to do the latter option first, and if that doesn't work, try the former.
First, check to make sure that shared memory exists. If shared memory doesn't exist, try restarting Runtime. Usually this will remedy the problem.
To run the Shared Memory UI, navigate to the runtime/tests
directory and run make shm_ui
. The executable will be in runtime/tests/bin
; run with ./shm_ui
in that directory. This is probably the single most useful thing to do when trying to debug Runtime.
Here are a list of other tools at your disposal to help diagnose the issue:
To get more information about Runtime, sometimes it is useful to run Runtime manually, as doing so dumps all logs of all levels to stdout
, i.e. your terminal screen. To do this, stop the Runtime systemd
services with sudo systemctl stop dev_handler
and then run Runtime manually with ./runtime run
from the root directory.
Sometimes, students will complain that a problem just happened to them but they can't reproduce it, or they claim that there's a problem with something but we can't reproduce it at the debug table. In that case, sometimes it is useful to look at both their Dawn console (if they have it open still) and/or the logger file on the SD card of the Raspberry Pi. This file is at runtime/logger/runtime.log
. By default, the logger will send every log with log level WARN
and up to this file. If something bad was indeed happening to the robot (lowcar
device misbehaving, student code erroring / spamming causing executor
to jam up, etc.) there should be a few logs indicating the issue in the file. Don't cat
the file directly to the screen though, the file will likely be pretty long! Use the command less runtime.log
to view the file, and then press the u
and d
keys to navigate the file (and show more of the file).
It is often useful to see the status of the network connections on the Raspberry Pi. These can be viewed with the command
ifconfig
The output will show up as two or three sections, labeled lo
, eth0
, and wlan0
.
-
lo
refers to the loopback interface at IP address127.0.0.1
, and it is the interface of the device back to itself (hence the name "loopback") -
eth0
refers to the ethernet interface, and it will appear automatically upon connecting the Raspberry Pi's Ethernet port to an active network router -
wlan0
refers to the wireless interface, and it will appear automatically upon connecting the Raspberry Pi to a wireless network
Things to watch out for include whether there is an IP address on the wlan0
interface (e.g. inet 192.168.0.101
), whether the wlan0
interface exists when you expect the Raspberry Pi to be connected to a wireless network, and whether the localhost
name is reported correctly (this allows the students and PiE staff to refer to the device as <hostname>.local
)
- Important
- Advanced/Specific