-
Notifications
You must be signed in to change notification settings - Fork 2
Shared Memory Wrapper
This page attempts to discuss the shared memory wrapper, the design decisions that were made regarding the wrapper, and how some of the details work / how those details enable certain optimizations.
Please read the wiki page on shared memory before proceeding.
Interprocess communication (IPC) is orders of magnitude of slower than communication between threads in the same process, which is slower still than communication between functions in the same thread. If we could, the fastest way to ferry data between Dawn/Shepherd, student code, and the devices attached to the robot would be to have the entirety of Runtime contained within a single thread of a single process. This, however, is obviously impractical for code readability, complexity, and maintainability, so Runtime is broken down into several independent processes (which themselves have separate threads) to divide up the tasks. As a result, we now need to deal with IPC.
There are various methods of IPC; in order of (generally) slowest to fastest: file sharing, sockets, pipes and message queues, and shared memory. Since so much of the data is being transferred over IPC, and previous iterations of Runtime using pipes still had problems with the speed of the IPC, the decision was made to use shared memory for Runtime's IPC. Additionally, shared memory is well-suited for Runtime since the data that is passed between the processes is very structured and consistent (device data, gamepad data, robot state, robot log), so we don't have to worry about constantly expanding or shrinking the size of the shared memory blocks.
The shared memory wrapper is an interface for the various Runtime processes to access useful data about the current state of the system: whether dawn is connected, what devices are connected, and device data.
Since nearly all internal communication on Runtime occurs through the shared memory blocks, and we want the effects of any updates to robot state to propagate through Runtime as quickly as possible, the wrapper should be optimized to some extent. To this end, we use several large bitmaps to try to reduce the amount of computation needed to retrieve updated information from the shared memory wrapper.
Though the shared memory wrapper should be fast, it should also be extremely robust and reliable, i.e. the data should not become corrupt under any circumstances. We also need to be able to ensure that the shared memory blocks are created reliably and removed reliably on Runtime shutdown, since if the shared memory blocks are not removed reliably on Runtime shutdown, then errors will occur on Runtime reboot. To this end, we use semaphores to control access to the various shared memory blocks, and have two programs, shm_start
and shm_stop
which create and destroy all of the shared memory blocks, respectively. shm_start
must be run before any of the Runtime processes initialize; shm_stop
must be run once all Runtime processes terminate in the Runtime reboot sequence.
-
shm_start.c
: the source code for the program that creates and initializes all of the shared memory blocks and semaphores -
shm_stop.c
: the source code for the program that destroys and unlinks all of the shared memory blocks and semaphores -
shm_wrapper.h
: the interface presented by the shared memory wrapper to all of the processes in Runtime -
shm_wrapper.c
: the source code that implements the shared memory wrapper
This section attempts to discuss some of the design decisions or more obtuse parts of the code.
Please read the wiki page on semaphores before proceeding.
In the shared memory wrapper, each device that is registered in the shared memory wrapper has two associated semaphores: one for its DATA
stream (where dev_handler
writes device data for executor
to read when executing student code), and another for its COMMAND
stream (where executor
writes commands from student code for dev_handler
to pass to the devices). When any process wants to access the data on a particular stream by calling the shared memory wrapper, the wrapper will automatically acquire the associated semaphore before accessing the data, and release it after the access. There are also five additional semaphores. The first two, cmd_map_sem
and catalog_sem
, are useful for facilitating the fast transfer of commands from executor
to dev_handler
, and for connecting and disconnecting devices, respectively. The next three; rd_sem
, gp_sem
, and log_data_sem
; are for access-guarding the robot description shared memory block, the gamepad state shared memory block, and the robot log data shared memory block, respectively. There is also an additional check to make sure that a gamepad is connected before attempting to access the gamepad state.
We want each individual piece of data to have its own semaphore, since if a single semaphore controlled access to too many different pieces of data, processes wanting to access one piece of data might block on a semaphore that is locked by a process that isn't even accessing the same piece of data. For example, if you have a semaphore that is controlling access to data blocks 1 and 2, and two processes A and B that both access both data blocks, it could be the case that process A locks the semaphore in order to modify data block 1. While process A is modifying data block 1, process B tries to acquire the semaphore in order to modify data block 2. The attempt is refused by the semaphore, since process A currently has the semaphore. But this doesn't make sense, since in this case process A is modifying data block 1 and process B is going to modify data block 2; they could be modifying the data blocks at the same time with no consequences! It would be better to have two semaphores; one access-guarding data block 1, and another on data block 2. In the shared memory wrapper, this is what we do, which is why we use 70 semaphores!
The fast transfer of commands from executor
and dev_handler
is done as follows. There is a command bitmap, which consists of MAX_DEVICES + 1
elements that are each MAX_PARAMS
bits long. If every single bit in the entire bitmap is 0, then no parameters on any devices in the system have been requested to change by the executor
. The bits in the first element of the command bitmap correspond to which devices have had their parameters changed; the bits in each of the other elements of the command bitmap correspond to which params of a particular device have changed. This is best illustrated with an example.
Suppose that both MAX_DEVICES
and MAX_PARAMS
were 32. Then the command bitmap would be made up of 33 elements, each element being a uint32_t
. At the beginning, suppose a device connects to the shared memory wrapper and is assigned the index of 0. This device also has 3 parameters, at indices 0, 1, and 2. When no changes are requested on this device, we have the following state (with the least significant bit (the 0th bit) on the right):
cmd_map[0] cmd_map[1] cmd_map[2] ....... cmd_map[32]
<32 bits> ... 0000000 <32 bits> ... 0000000 <32 bits> ... 0000000 ....... <32 bits> ... 0000000
Now suppose executor
is running some student code, and the student code has requested to change the param at index 1 of the connected device. When executor
calls shared memory to write this new value to that device's COMMAND
block, the wrapper takes note of this and sets the bit corresponding to that device to 1 in cmd_map[0]
, and the bit corresponding to that device and that param to 1
(in this case, the second bit in cmd_map[1]
). So after the executor
writes to the COMMAND
block and returns, we now have this state (again, with the least significant bit (the 0th bit) on the right):
cmd_map[0] cmd_map[1] cmd_map[2] ....... cmd_map[32]
<32 bits> ... 0000001 <32 bits> ... 0000010 <32 bits> ... 0000000 ....... <32 bits> ... 0000000
The 0th bit of cmd_map[0]
being set to 1 says that the device at index 0 has one or more parameters changed. The 1st bit of cmd_map[1]
being set to 1 says that the 1st parameter of the device at index 0 has had its value changed.
This makes several optimizations possible for dev_handler
. dev_handler
no longer needs to read from each connected device continuously and compare it with the command that was most recently sent to the device to determine whether or not a new command was changed. Rather, it can just continuously read cmd_map[0]
and check if it is equal to 0. If it is not equal to 0, that means that at least one device has new commands to send, and it can proceed to retrieve those new values by calling device_read
on exactly the devices specified in cmd_map[0]
. Furthermore, dev_handler
can simply provide the bitmap corresponding to that device for which parameters have changed to the call to device_read
as the argument params_to_read
, meaning that the dev_handler
does not waste any time retrieving, processing, packaging, or sending commands that have not changed. In our example, dev_handler
would read cmd_map[0]
, notice that the 0th bit is set to 1, and call device_read
with dev_ix = 0
and params_to_read = cmd_map[1]
to retrieve the new param values.
Of course, after dev_handler
retrieves the data, all of the read bits are cleared by the shared memory wrapper (set back to 0).
- Important
- Advanced/Specific