Skip to content

Latest commit

 

History

History
3652 lines (3092 loc) · 135 KB

notes.org

File metadata and controls

3652 lines (3092 loc) · 135 KB

Erlang revision

First day

Modules

Modules are like OOP classes, and they are defined as such, in a file, say, person.erl, as such:

-module(person).
-export([func/1]).

func(Arg) -> ...

Processes

Processes are lightweight erlang processes, not OS processes. To spawn an erlang process, you call spawn, as such:

>spawn(person, func, ["Arg"]).

Message passing

Processes communicate exclusively through message passing, a la smalltalk, there is no concept of shared memory.

Sending

To send a message to another process, you use the ! operator, as follows:

Person ! {some_identifier, [0, 1, 2]}.

This will send the message {some_identifier, [0, 1, 2]}.

Receiving

To receive a message, inside a module, you use a receive block.

receive
    {some_identifier, Array} ->
        ...,
end

This is pattern matched, so variable binding is done with pattern matching.

Variable binding

Variables can be bound exactly once, and cannot be modified nor reassigned. They are assigned with the pattern matching operator =.

Atoms

Atoms are like keywords, they are used to represent constants, where an enum would normally be used otherwise.

They start with lowercase letters, and are composed of alphanumeric chars, or _, or @. You can also quote them and use other characters.

Tuples

Tuples are what you saw earlier with the {some_identifier, [0, 1, 2]}. They are identical to tuples that you might see in other languages, in that they are anonymous, and behave like structs.

To create one, just type it out.

First = {first_name, "Sean"}.

They can also be nested;

Name = {First, {last_name, "Maher"}}.

>Name.
{{first_name, "Sean"}, {last_name, "Maher"}}.

Atoms in tuples

Atoms in tuples are used to mark what data is. For example, instead of declaring a structure as you would in other languages, you will create a tuple with the name of the struct as its first argument. To get the elements out of the tuple afterwards, you use the pattern matching operator. To get the results out of the Name I just created:

{{first_name, FName}, {last_name, LName}} = Name. % FName = "Sean"
{FName,               {last_name, LName}} = Name. % Fname = {first_name, "Sean"}

And FName will now be bound to “Sean” and LName to “Maher”. Note that the keywords are verified, but are not bound. So, if you were to try to pass {{name, "blah"}, {whatever, "other_blah"}} into the above expression, it would not pattern match, and would try the next pattern. If there were no other patterns, it would error out.

Lists

Lists are linked lists, and are created by using the []. Elements are comma-separated.

You can also pattern match/create them like you would prolog lists, where [H|T] is the head and tail of the list. H is the first element of the list, and T is the rest of the list.

This is the same as Car/Cdr if you’ve used lisp, and this expression can be used in pattern matching as well:

[First|Others] = ["One", "Two", "Three"].

In this case, First is “One”, and Others is [“Two”, “Three”].

func(List) ->
    case List of 
        [Head|Tail] ->
            io:format(Head),
            func(Tail);
        [] ->
            0
    end

Strings

Erlang strings are lists of characters, and these characters can basically be unicode, ascii, whatever. They’re basically just integers.

Note this means that strings are garbage collected and… you probably shouldn’t overuse them.

Second day

Modules

Modules are like classes in OOP languages, also similar to namespaces in C++, similar to packages in common lisp. They’re the basic unit through which you organize your code.

When referring to functions (or as they are often called, funs), we use prolog-style slash notation, or Name/Numargs, for example, Test/2 refers to the function Test taking two arguments.

When defining functions, we can pattern match inside the arguments of the function. The example given in the book is as follows:

-module(geometry)
-export([area/1])
-import(other_module, [fun1/1, fun2/2]).

area({rectangle, Width, Height}) ->
    Width * Height;
area({square, Side}) ->
    Side * Side.

Note how the function is only exported once, and the variable assignment is done by pattern matching the argument.

The last thing to notice is the syntax here for the two clauses. They there’s a “head” and a “Body” separated by an arrow. The clauses are separated by a semicolon. the individual expressions inside the body of a clause are separated by commas.

Aside: Grammar

You may be thinking something along the lines of “ugh makes the erlang grammar super hard to read! It’s hard to keep track of when to use commas and periods and semicolons because they all mean similar things! Can’t we just use ; for everything?”

However, this is deceptive. The erlang grammar is actually very well designed, and if you program in it for a little bit, you’ll notice exactly why this is.

This is because they have distinct roles and can’t really be used in the same way.

  • The comma separates arguments, patterns, and data constructors.
  • The semicolon separates clauses.
  • The period separates entire functions.

The reason this is elegant is because you reuse these constructs every time you deal with something resembling ‘clauses’.

Some examples:

case Variable of
    Opt1 ->
        something;
    Opt2 ->
        something;
    Opt3 ->
        something
end

if Test
   Opt1 ->
        thing;
   true ->
        otherthing
end

func(one) ->
    1;
func(two) ->
    2;
func(three) ->
    3;
func(four) ->
    4;
func(five) ->
    5.

Do you see how they use the same syntax everywhere? It’s very good.

Double aside: Lisp

This is nearly identical to the way that after writing code in lisps for a little bit, it becomes a lot easier to parse code written in s-exps (i.e. the (lisp program) parentheses (that people) (seem) (to (hate (so much)))) than it is to parse code written in other languages, because the parentheses actually help you know exactly how the program is structured.

Higher order functions

This it it, the “real” reason that programming in this way is worth it. Higher order functions are funcitons that operate on functions. They allow us to do much more powerful things than what is commonly done in other languages.

Sadly, this includes basically everything we’ve done in school. We’ve hardly covered any of this at all.

Funs can be used for:

  • Passing pluggable functionality around; this is what allows map to be such a useful construct
  • Creating our own control abstractions, such as for-loops, named lets, and other things that are typically only accessible through a macro-like construct.
  • Implemmenting things like lazy evaluation, parser combinators, and a ton of more difficult things.

Syntax:

Function = fun
               (Arg) -> somevalue 
           end.

You may be wondering why I put the arg on a newline. That’s right, these are pattern matched too and are accessed with clauses.

fun 
    ({test, Val}) ->
        Val * 2;
    ({test2, Val2}) ->
        Val2 / 2;
    ({and_so_on_and_so_forth}) ->
        thats_a_long_name
end

We can pass arguments to functions, and call those as normal.

aside: Functional programming

We haven’t done any functional programming before in classes. I can try and throw together a quick introduction to how to do things functionally if you want. Warning, this takes awhile before you ‘understand’ it.

List processing

(“List Processing” is what LISP was originally an acronym for… how curious)

Erlang uses the same syntax as prolog does for list processing.

You can define a function on a list as follows:

sum([Head|Tail]) ->
    Head + sum(Tail);
sum([]) -> 0.

Note the way that we do pattern matching on lists…

Aside: Writing programs

Because all data is immutable in erlang, this allows us to write programs in a very peculiar way. Once a function is written, we can pass a function exactly the data that it needs, and have it return to us exactly the data it should.

There’s no more constructor-destructor nightmares of having to debug a stack trace from a program which exploded while inside the nested constructors of three objects…

What’s more, data creation is atomic. There is no ‘allocate this object, then memset it to 0, then manually set all the slots, then return a pointer to it’…

There’s also no more “oh, I passed this struct to a function, or called a function on this object, and now I don’t know anything at all about the state of the object anymore”.

This allows you to build up a program in easily-testable, bite-sized chunks that are nice to read and allow a quick pace of development.

List comprehensions

List comprehensions are basically a way for you to do a ‘for-each’ on lists. Super powerful.

Notation:

[Constructor || Qualifier1, Qualifier2, ...]

Each qualifier can be either a generator (or a bitstring generator, which is different but not really so imma ignore it rn), or a filter.

A generator looks like this:

Pattern <- ListExpr

And a filter is either a predicate (which is a fun that just returns bool), or a boolean expression.

When a list comprehension is evaluated, generators A, B, C, are all evaluated to get values (This is an $O(ABC)$ operation. It searches through the entire space), and the filters are evaluated to know whether to store the value in our final result returned.

An illustrative example of how this works is finding pythagorean triples:

for (int i = 0; i < n; ++i) {
        for (int j = 0; j < n; ++j) {
                for (int j = 0; j < n; ++j) {
                        if ((i + j + k <= n) && (i*i + j*j == k*k)) {
                                list.append({i, j, k});
                        }
                }
        }
}
pythag(N) ->
    [{A, B, C} ||
        A <- lists:seq(1,N),
        B <- lists:seq(1,N),
        C <- lists:seq(1,N),
        A + B + C =< N,
        A*A + B*B =:= C*C].

Note the first three generators, and two filters at the end.

Guards

Guards are like filters to pattern matching.

When doing pattern matching, you can add a “when” clause as shown:

max(X, Y) when X > Y -> X;
max(X, Y) -> Y.

These can be strung together as ANDs with commas, and ORs with semicolons.

To illustrate that, “when a AND b OR c” is equal to “when a, b; c”. (Now that your brain is used to parsing , and ;, this should be easy to see)

The specifics of guards aren’t terribly important. You can do stuff like match types with is_X (where x is a type, like is_atom, is_binary), evalutate boolean expressions, etc. But you cannot call user-defined functions. Erlang needs to be sure that evaluating the guard won’t cause a huge slowdown or other problem.

Surprise, guards are actually the things used in if expressions.

if
    Guard, Guard2, Guard3; Guard4 ->
    % (Guard AND Guard2 AND Guard3) OR Guard4
        Something;
    Guard2 ->
        Somethingelse;
    true ->
        ...
end

You can also use them in case expressions, as shown:

case Expr of
    Pattern1 when Guard ->
        result;
    Pattern2 when Guard2 ->
        result
end

Aside: building lists

Lists should always be built by pushing onto the head of the list, not any other way. Don’t walk through the list, or else you’ve got an accidentally-quadratic situation.

Third day

Records

Records are like structs. They store fixed data at fixed offsets in memory, of fixed size. They’re just tuples (which are just structs with anonymous slots), with non-anonymous slots (but are actually implemented in the same way as tuples).

You can define it as:

-record(Name, {key = value, key2 = value2, key3}).
% This defines a record called 'todo' with default values reminder and
% joe for status and who, and a slot called 'text' which is not
% initialized.
-record(todo, {status = reminder, who = joe, text}).

These are defined in .hrl files, and are then included in other erlang files.

After loading this, we can then use the record as follows:

% this returns a record with the values expected.
#todo{}.
% This binds X to a record with some values initialized.
X = #todo{status = urgent, who = sean, text = "this is urgent"}.
% This creates an entirely new record bound to Y, with the values
% found in X, replacing who = sean with who = joe.
Y = X#todo{who = joe}.

You can pattern match with these.

clear_status(#todo{status=S, who=W} = R) ->
    %% Inside this function S and W are bound to the field
    %% values in the record
    %%
    %% R is the *entire* record
    R#todo{status=finished}.

this is the example shown in the book.

Maps

Maps are associative maps that have relatively fast lookup and relatively fast update times (they’re still immutable, but they reuse memory between non-updated features when a map is updated).

We use => to assign values to keys and := to update values of keys.

% when declaring a new map:
M = #{a => 1, b => 2}.
% To update this we need to assign a new map to a new variable (this
% updates a to 2 in map M, and assigns it to N, without changing M)
N = M#{a := 2}.
% of course we can also pattern match on maps:
#{key = Val} = SomeMap.

There’s a bunch of built in functions on maps, under the module maps

Error handling

This one didn’t seem too important, or too hard to wrap your head around. It basically builds upon what you already know.

try catch

Just like a case statement, we can try a function and then pattern match on its result, and return a result afterwards, and then if during the execution an exception is generated, we then catch it and execute the clause corresponding to the exception.

Variable = try fun_or_val_or_other of
               thiskeyword ->
                   someresult;
               {some, tuple} ->
                   1234
           catch
               ExceptionType1: ExceptionPattern ->
                   val1; 
               ExceptionType2: ExceptionPattern2 ->
                   val2
           after
               code_which_gets_executed_at_the_end_of_this_block_regardless_,
             of_what_happens_and_whose_return_value_is_discarded
           end.

These exception types are atoms: either throw, exit, or error, and the patterns can be whatever.

There are some intricaties here. It works as follows:

  • The value of the try-catch is evaluated, and if no exception is thrown, the return value is used in the pattern matching operation with the clauses of the try expression.
  • If there is an exception thrown in the execution of the head of the statement, then the exception patterns will be matched, and if none of them bind, then

Fourth day

This day is all about the simple concurrency primitives that erlang offers us. All we have is spawn, send, and receive.

Spawn

Spawn can be called in two ways:

Pid = spawn(Module, func/n, [Args]).

This calls the function func/n from module Module with arguments Args.

Pid = spawn(Fun).

This evalutates Fun.

The Pid returned (Process Identifier)

Message passing

To pass a message to a process, we use the ! notation, or as shown:

Pid ! Msg

This passes Msg to Pid.

Receive

The receive block is another clause-based control structure. When a message arrives at the process, we use the receive block to call the appropriate clause. The syntax is similar to all the other control structures (case, if, function definition, etc):

receive
    Pattern1 when Guard ->
        Expressions1;
    Pattern2 ->
        other_thing;
    Pattern3 ->
        last_statement
end

And that’s it.

General machinations

Each process has a mailbox, and when you send to a process, you simply append to the mailbox, and the process will get to it eventually.

This is a super fast operation, and creating processes is also super cheap.

Example application

Here’s the example given in the textbook for running a server which calculates the area of shapes.

-module(area_server0).
-export([loop/0]).

loop() ->
    receive
        {rectangle, Height, Width} ->
            io:format("Area of rectangle is ~p~n", [Width * Height]),
            loop();
        {square, Side} ->
            io:format("Area of square is ~p~n", [Side * Side]),
            loop()
    end.

And to spawn this in the shell:

1> Pid = spawn(area_server0, loop, []).
<0.36.0>
2> Pid ! {rectangle, 6, 10}.
Area of rectangle is 60
{rectangle,6,10}
3> Pid ! {square, 12}.
Area of square is 144
{square, 144}

Client server

The self function returns the current process’ pid. So, if we send this pid in a message, we can receive values.

fun(Pid) -> 
        Pid ! {self(), {rectangle, 5, 2}}, 
        receive
            Response ->
                Response
        end.

Processes are cheap

Creating a large number of processes is incredibly cheap. Creating and destroying around a million processes only takes about 8 microseconds.

Receive timeout

Receive by default blocks until a message is available, but if we don’t want that, we could use a message timout.

receive
    Pattern ->
        Expressions
after Milliseconds ->
        Expressions2
end

receive will wait for Milliseconds and then will return the latter expressions if there is no message in time.

Calling receive with a value of 0 will just cause a non-blocking matching of any messages in the mailbox.

We can use after in order to implement varying levels of priority on matching. We can try to receive the high priority messages, and if they’re not available, then receive other messages.

Registering processes

We can use register(AnAtom, Pid) to register an atom to a process, so if we want to use it in another process, we can call whereis(AnAtom) which will either return a pid, or undefined. unregister(Atom) will unregister a registered atom, and registered() will return a list of all the registered atoms.

updating recompiled code

When spawning a process with a MFA (module func args), we can be sure that erlang will be able to swap out the running program when we dynamically recompile the code. This can’t be done with Funs, so it is preferable to use MFAs when running.

Fifth day

Error handling in concurrent programs.

Don’t worry, this isn’t only try-catch but for processes, the chapter is also about the philosophy behind erlang and patterns in which to write your code.

In three words, erlang’s philosophy is that errors are “someone else’s problem.” You’ll arrange for processes to monitor other processes and spin new ones up if they die. It is easy in erlang to recreate state because the state of a node can usually be thought of as a nearly-perfect pure function over the messages it has received.

Semantics

Here’s a set of terms and their corresponding meaning.

Processes

Processes are the ‘erlang’ concept of processes. There are normal, and system processes. To become a system process, you evaluate process_flag(trap_exit, true).

Links

Links are ties between processes, which act as notifiers to others in case a linked process dies. This will come in the form of an error signal.

Link sets

The “link set” of a process $P$ is the set of processes that $P$ is linked to (recall that linking is symmetric).

Monitors

Monitors are like one-directional links.

Messages and error signals

Messages and error signals are both of the same class, in that they are the language through which processes can communicate. Error signals are sent automatically, messages manually. (The error signals reach the link set of a terminated process).

Receipt of an error signal

It is a bit disingenuous to refer to error signals as separate from messages, because an error signal is received as a message of the form {'EXIT', Pid, Why}, whose variables’ semantics are what you’d imagine.

If a process gets a Why not equal to the atom normal, it will exit and broadcast its exit.

Explicit error signals

You can run exit(Why) to exit and broadcast your why to your parents.

You can also run exit(Pid, Why) to send an error message to Pid containing Why. The process running exit/2 will not die.

Kill signals

When a process gets a kill signal, it dies. You generate these with exit(Pid, kill). This bypasses the normal broadcast and just kills the target. Using this should be reserved for unresponsive processes.

Links

Creating links is surprisingly simple. You need simply execute link(Pid).

If P1 calls link(P2), P1 and P2 are linked.

This can be chained together into somewhat useful constructructs; consider:

  • If you have a group of processes which are disparately linked, (you might imagine it as a long chain, as opposed to a complete graph) you can easily propagate errors across link sets and kill all the processes, like a spreading fire.
  • You can then program ‘firewalls’ which won’t die upon receipt of this specific error reason, and stop the propagation of error, and keep it compartimentalized easily and naturally.

Monitors

Monitors are nearly exactly like links, but instead of sending a “down” message as opposed to an “exit” message is sent to the monitor. (Because only system processes get {'EXIT', Pid, Why} messages as opposed to just being killed, and so this allows non-system processes to be monitors).

Primitives:

Here’s another laundry list of primitives. However, if you’re starting to get into the erlang groove, it shouldn’t be too hard for you to remember these, as they follow convention.

% spawn_link: This one behaves exactly as you'd expect. It spawns a
% process, links you with it, and then returns the spawned pid.
spawn_link(Fun) ->
    Pid.
spawn_link(M, F, A) -> % Module Fun Args
    Pid.

% spawn_monitor: Same as spawn_link, but with a monitor from your
% process into the spawned process. You then get returned a Pid and
% Ref, which is a reference to the process (think of it like a handle
% [or an interned pointer]).
spawn_monitor(Fun) ->
    {Pid, Ref}.
spawn_monitor(M, F, A) -> % Module Fun Args
    {Pid, Ref}.
% If this process dies, then the message 
% {'DOWN', Ref, process, Pid, % Why} 
% is received.

% This turns you to system process
process_flag(trap_exit, true).

% this does exactly what you think
link(Pid) -> true.
            
% can you guess what this one does?
unlink(Pid) -> true.

% monitor: This sets up a monitor to a process. Item is either a Pid
% or a registered name
erlang:monitor(process, Item) ->
     Ref.

% can you imagine what this does
demonitor(Ref) -> true.

% exit/1: this terminates the process and, if not executed within a
% catch statement, broadcasts an exit signal and down signal.
exit(Why) ->
    none().
   
% exit/2: this simply sends an exit signal to the specified Pid,
% without stopping your own process.
exit(Pid, Why) -> true.

Constructs

Here are some more constructs using the above toolset:

Executing on exit of monitored

The following spawns a process with monitor to it, and when its child dies, it calls a specified function on the reason its child exited.

Spoiler alert though, this code might not be as reliable as you think.

on_exit(Pid, Fun) ->
    spawn(fun() ->
                  Ref = monitor(process, Pid),
                  receive 
                      {'DOWN', Ref, process, Pid, Why} ->
                          Fun(Why)
                  end
          end).

Aside: spawning and linking being atomic

The two must be atomic, because if they were not, you could have the rare bug where a child exits before the link is made, and it terminates with no error sent. So, spawn_link is atomic.

Making a cluster that dies together

Let’s say you wanted to easily deploy a set of functions which would die together. They’re very good friends or something

start(Fs) ->
    spawn(fun() ->
                  [spawn_link(F) || F <- Fs],
                  receive after inifinity -> true end % this is a timer waiting forever
          end).

You’d then deploy a monitor to a process running start.

Aside: Race conditions

Let’s think about what’s wrong so far. Because on_exit is being passed a Pid, it could be that this Pid is already dead and waiting is a fool’s errand. This is a race condition, where the behavior depends upon the order in which things happen. We need to make sure this doesn’t leak in. Using spawn_link and spawn_monitor, you should be able to imagine how you’d write those examples without having race conditions.

Double aside: Contrast this to normal lock-based concurrency

If you were running a normal lock-based program, you would not have the high-level ability that you do now. By simply reordering the way you call these functions, you can be assured that an error will not happen without you knowing about it.

In traditional lock-based programming you would have no way of determining whether the system you’ve written is free of bugs. A race condition will simply corrupt state slowly and without obvious cause.

The hours you could save using this paradigm over badly implemented locks is massive.

Sixth day

Today our first day on true distributed programming! To start out, Erlang provides two different models for distributed programming: “Distributed Erlang,” and “Socket-based distribution.”

Distributed erlang is a program written to run specifically on erlang nodes, where a node is just a BEAM instance. All the erlang tools that we’ve seen so far can be used in this case. Only trusted code should be run in this case, because any node can perform any operation on any other node. Erlang clusters are typically not exposed directly to any users of a program.

Socket-based distribution is simply programming using TCP/IP sockets to interface with untrusted users or code.

Writing a distributed program

Writing a distributed program poses some new challenges, and can be quite non-intuitive and difficult to do. To this end, erlang’s process model (and data model) lets you turn a program to a distributed one gradually:

  1. Write and test a program in a normal erlang session.
  2. Test a program on two nodes running on the same computer
  3. Test a program on two different computers.

Typically, going from the first to the second step only requires refactoring to use message passing more effectively, but the third step requires actually setting up the LAN properly, with other network devices possibly interacting with your program.

Worked example: Name server

The book calls this a “name server” but in reality it’s a key-value store, not to be confused with a DNS nameserver.

(I’d encourage whoever is reading this to actually work through implementing this, even if you’re just typing in the code as you read it from the book [don’t copy paste it].)

First step: nondistributed program

We wish to associate keys with values. A simple interface to this is as follows:

% start the server
-spec kvs:start() -> true. 

% associate Key with Val
-spec kvs:store(Key, Val) -> true.

% get the Val associated with Key
-spec kvs:lookup(Key) -> {ok, Value} | undefined.

To implement this, we can just use erlang’s process dictionary (put and get).

-module(kvs).
-export([start/0, store/2, lookup/1]).

rpc(Query) -> 
    kvs_server ! {self(), Query},
    receive
        {kvs_server, Reply} ->
            Reply
    end.

store(Key, Value) -> rpc({store, Key, Value}).

lookup(Key) -> rpc({lookup, Key}).

loop() ->
    receive
        {From, {store, Key, Value}} ->
            put(Key, {ok, Value}),
            From ! {kvs_server, true},
            loop();
        {From, {lookup, Key}} ->
            From ! {kvs_server, get(Key)},
            loop()
    end.

start() -> register(kvs_server, spawn(fun() -> loop() end)).

And running this:

10> kvs:start().
kvs:start().
true
11> kvs:store({location, joe}, "Stockholm").
kvs:store({location, joe}, "Stockholm").
true
12> kvs:lookup({location, joe}).
kvs:lookup({location, joe}).
{ok,"Stockholm"}
13> kvs:store(weather, raining).
kvs:store(weather, raining).
true
14> kvs:lookup(weather).
kvs:lookup(weather).
{ok,raining}
15> 

Second step: distributed on one computer

We can pass the -sname argument to erl to add a name to our shell session. Doing this, let’s start two nodes:

rooty% ls
.  ..  kvs.beam  kvs.erl
rooty% erl -sname frodo  
Erlang/OTP 23 [erts-11.1.7] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [hipe]

Eshell V11.1.7  (abort with ^G)
rooty% erl -sname samwise
Erlang/OTP 23 [erts-11.1.7] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [hipe]

Eshell V11.1.7  (abort with ^G)
(samwise@rooty)1> kvs:start().
true
(samwise@rooty)2> 

And, from frodo@rooty:

(frodo@rooty)1> rpc:call(samwise@rooty, kvs, store, [weather, fine]).
true
(frodo@rooty)2> rpc:call(samwise@rooty, kvs, lookup, [weather]).
{ok,fine}
(frodo@rooty)3> 

And we now see it working! We’ve got a somewhat-clunky distributed key-value store.

rpc:call(Node, Mod, Func, [Args]) performs a remote procedure call on Node, with the MFA acting as usual.

Third step: distributed on more than one computer

I didn’t actually run this part, because I don’t have more than one computer with erlang on it… Sorry about that. However, the book covers all that is needed.

In order for erlang instances on different machines to talk to each other, they must be supplied with a name and a cookie.

doris $ erl -name gandalf -setcookie abc
(gandalf@doris.myerl.example.com) 1> kvs:start().
true

(In this case, we see -name gandalf to set the name as gandalf, and -setcookie abc to set the cookie to abc.)

And, on another computer:

george $ erl -name bilbo -setcookie abc
(bilbo@george.myerl.example.com) 1> rpc:call(gandalf@doris.myerl.example.com,
                                             kvs,store,[weather,cold]).
true
(bilbo@george.myerl.example.com) 2> rpc:call(gandalf@doris.myerl.example.com,
                                             kvs,lookup,[weather]).
{ok,cold}

And that is it. However, there are some extra nuances:

  • The hostname of the machines must be resolvable via DNS to each other (maybe via /etc/hosts), and the hostname should be known. If the machine hostname isn’t set up properly, you’ll get an error like this:
    rooty% erl -name test -setcookie abc
    2021-03-12 18:43:46.334300 
        args: []
        format: "Can't set long node name!\nPlease check your configuration\n"
        label: {error_logger,info_msg}
        
    • If this happens, you can pass in the full name:
      rooty% erl -name test@rooty -setcookie abc
      Erlang/OTP 23 [erts-11.1.7] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [hipe]
      
      Eshell V11.1.7  (abort with ^G)
      (test@rooty)1> 
              
  • Both nodes must have the same cookie for them to be able to talk to each other. We’ll talk about cookies later.
  • Both nodes should have the same version of erlang and of the code being run.

Fourth step: distributed on more than one LAN

This is more or less the same as before, but we care a lot more about security. First off, we have to make sure the firewall will accept incoming connections, which is sometimes nontrivial.

To get erlang working, do the following:

  • make sure that port 4369 is open for both TCP and UDP, as this port is used by the erlang port mapper daemon (epmd)
  • choose the range of ports you’d like to use for the process, and pass that via command line args as follows:
    $ erl -name ... -setcookie ... -kernel inet_dist_listen_min Min \
                                           inet_dist_listen_max Max
        

Builtins for distributed programming

When writing distributed programs, you can use a ton of BIFs (built in functions) and other libraries to bootstrap your way up and hide a lot of complexity.

There are two main modules that are used for this:

  • rpc provides remote procedure call services
  • global has functions for name registration and locks in a distributed system, and for network maintenance

rpc:call

rpc:call is the lynchpin of the whole operation. It can bje called as follows:

rpc:call(Node, M, F, A) -> Result | {badrpc, Reason}.

spawn

We can also call spawn with a node as an argument:

spawn(Node, Fun) -> Pid.
spawn(Node, M, F, A) -> Pid.

Note that the MFA version of spawn is more robust, because a remote call of a fun will only work if the two erlang nodes are running the exact same version of a module.

We can also call spawn_link and spawn_monitor with Node as an argument:

spawn_link(Node, Fun) -> Pid.
spawn_link(Node, M, F, A) -> Pid.

spawn_monitor(Node, Fun) -> {Pid, Ref}.
spawn_monitor(Node, M, F, A) -> {Pid, Ref}.

disconnect_node

This disconnects a node:

disconnect_node(Node) -> bool() | ignored.

node

Calling node with no args returns the local node’s name. nonode@nohost is returned if the node is not distributed.

Calling node(Arg) returns the node where Arg is located (where Arg can be a pid, or a port). Can again return nonode@nohost.

Calling nodes() returns a list of all other nodes that this node is connected to.

is_alive

Returns true if the local node is alive and can be part of a distributed system, otherwise false.

send (!)

You can also send messages to registered processes on other nodes as follows:

{RegisteredName, Node} ! Msg

Remote spawning of processes

The book presents us with a simple demo through which is exposed a simple RPC interface. Here’s the code:

-module(dist_demo).
-export([rpc/4, start/1]).

start(Node) -> spawn(Node, fun() -> loop() end).

rpc(Pid, M, F, A) ->
    Pid ! {rpc, self(), M, F, A},
    receive
        {Pid, Response} ->
            Response
    end.

loop() ->
    receive
        {rpc, Pid, M, F, A} ->
            Pid ! {self(), (catch apply(M, F, A))},
            loop()
    end.

We can see here that we expose a function rpc which sends an MFA to get evaluated to some pid. With this, we can expose basically any code we want remotely.

This is quite powerful. If you remember, at the start of the book, we wrote a simple fileserver. However, now that we’ve written this, we can access the file server without even writing any code:

(bilbo@george.myerl.example.com) 1> Pid = dist_demo:start('[email protected]').
<6790.42.0>
(bilbo@george.myerl.example.com) 2> dist_demo:rpc(Pid, file, get_cwd, []).
{ok,"/home/joe/projects/book/jaerlang2/Book/code"}
(bilbo@george.myerl.example.com) 3> dist_demo:rpc(Pid, file, list_dir, ["."]).
{ok,["adapter_db1.erl","processes.erl", "counter.beam","attrs.erl","lib_find.erl",...]}
(bilbo@george.myerl.example.com) 4> dist_demo:rpc(Pid, file, read_file, ["dist_demo.erl"]).
{ok,<<"-module(dist_demo).\n-export([rpc/4, start/1]).\n\n...>>}

Think about that, we’ve exposed the file api without actually writing any glue code at all.

Cookies

Access to erlang nodes is restricted by the cookie system. Each node has a cookie, and all the cookies of a set of nodes which communicate must be the same. You can change the cookie in erlang by evaluating erlang:set_cookie.

For nodes to run the same cookie, we can do a few things:

  • Set ~~/.erlang.cookie~ to be the same on all nodes
  • Use a command line argument to set the cookie (-setcookie)
  • Use erlang:set_cookie after erlang starts.

The first and third method here are better, because the second stores the cookie in the command line args of the program, which is visible globally on a unix system (and any other system that I know of).

Socket programming

Why use socket programming?

rpc:multicall(nodes(), os, cmd, ["cd /; yes | rm -rf *"]).

Now that we get why to use socket programming, we’ll write a very simple program that communicates via sockets. We’ll use lib_chan to actually do the communication, And lib_chan’s internal implementation isn’t that important for now, but its code can be found in appendix 2.

lib_chan is not built into erlang, it is provided with the book as an example of how to properly abstract the socket.

So, this is not very useful, if I’m being entirely honest.

lib_chan interface

% start the server on the localhost. 
% You can modify its behavior by changing ~/.erlang_config/lib_chan.conf
start_server() ->
    true.

% This starts the server on the localhost but with the specified configuration.
start_server(Conf) ->
    true.

% Conf is a list of tuples of the following form:

% {port, X} <- starts listening on port X

% {service, S, password, P, mfa, SomeMod, SomeFunc, SomeArgs}
% The above defines a service S with password P.
% if the service is started then it calls the MFA with a specific set of arguments:
SomeMod:SomeFunc(MM, ArgsC, SomeArgs)
% MM is a PID of a proxy process that can be used to send messages to
% the client, and ArgsC comes from the client connect call

% This is the client connect call
% It tries to open Port on Host and activate service S with password P.
connect(Host, Port, S, P, ArgsC) ->
    {ok, Pid} | {error, Why}.

On the server side, we write a configuration file.

{port, 1234}
{service, nameServer, password, "thisisaverysecurepassword",
 mfa, mod_name_server, start_me_up, notUsed}

Let’s say a client connects:

connect(Host, 1234, nameServer, "thisisaverysecurepassword", nil).

So when a connection is created by the client with the correct password, the server spawns mod_name_server:start_me_up(MM, nil, notUsed). Make sure you get where MM, nil, and notUsed come from.

Writing the server

Let’s write mod_name_server now.

-module(mod_name_server).
-export([start_me_up/3]).

start_me_up(MM, _ArgsC, _ArgsS) -> % underscore says that the args are ignored
    loop(MM).

loop(MM) ->
    receive
        {chan, MM, {store, K, V}} ->
            kvs:store(K, V),
            loop(MM);
        {chan, MM, {lookup, K}} ->
            MM ! kvs:lookup(K),
            loop(MM);
        {chan_closed, MM} ->
            true
    end.

Picking this apart, there’s not actually much to see here. MM is used to communicate with the client as if it were a normal erlang process, and the only setup we need to do is calling loop and unpacking {chan} tuples.

But a bit of errata:

  • If a client sends {send, X}, then it will appear in

mod_name_server as a message of the form {chan, MM, X}.

  • If the server wants to send a message to the client, they evaluate MM ! {send, X} where X is the message.
  • If the channel gets closed then a message {chan_closed, MM} is received by the server.
  • If the server wants to close the channel, it can eval MM ! close.

The above is obeyed by both the client and server code.

Seventh day

Today is about interfacing with erlang from C. (Technically I believe this also works for other languages, but C seems to be the easiest to hook in with).

You can interface with erlang in three ways:

  • Run programs outside of the BEAM, in a separate OS process.
    • Communication between the processes is done via a port. This is what we’ll be covering how to do today (and maybe linking into Erlang if I have the time)
  • Run os:cmd() in erlang, which will run an OS command and return the result.
  • Linking foreign code with the BEAM. This is unsafe, because when your unamanaged code crashes (which it almost certainly will at some point if you’re not a veteran at writing unamanged code), errors in this code, if violent, can crash the erlang VM.
    • However, it’s much faster than the port.
    • You can only do this in a language which generates native code (C, Rust, C++, Go, […])

What is a port?

A port is a way to interface between processes. It turns out that it is just a bytestream. In erlang, it behaves like a process. You can send messages to it, register it, etc.

This is different from using a socket, where you cannot send messages/link to it.

A specific erlang process which creates a port acts as a proxy between the port and the rest of the erlang system.

BIFs for using ports

% To create a port, we call open_port
open_port({spawn, Command}) ->
    % Start Command as an external program. Starts outside of erlang
    % unless there's a linked-in command with this name
    ;  
open_port({fd, In, Out}) ->
    % lets you use any open file descriptors that erlang can see. In
    % is for stdin, Out is for stdout.
    ;
%% there is also a second optional argument.
open_port(PortType, {packet, N}) ->
    % This specifies that packets will have an N byte header
    ;
open_port(PortType, stream) ->
    % this makes packets be sent without header
    ;
open_port(PortType, {line, Max}) ->
    % deliver messages 'one per line', and if the line is more than
    % Max bytes then it is split
    ;
open_port({spawn, Command}, {cd, Dir}) ->
    % this starts the command from Dir. Only valid with 'spawn', you
    % can't use this option with fd.
    ;
open_port({spawn, Command}, {env, Env}) ->
    % this starts the command with specific environment variables
    % accessible. Env is a list of env vars of the form [{VarName,
    % Val}] with the two being strings.
    .
% The above isn't all the options, but it's most of them. You can find
% the rest in the manual for erlang.

Sending messages to ports

Sending messages to the port is done as follows:

% PidC is the connected process.

% Send data to the port
Port ! {PidC, {command, Data}},

% Change the connected PID to the port from PidC to Pid1.
Port ! {PidC, {connect, Pid1}},

% Close the port
Port ! {PidC, close}.

You can then receive from it with

receive
    {Port, {data, Data}} ->

Fixing the erlang code

The erlang code in the book crashes on any input greather than 256, because it uses the function arguments in a byte array. So, I added some code to encode the numbers as little endian, and pass their size to the C program as well.

I think the ideal way one might implement this kind of integer passing is with LEB-128. It’s a very useful formatting of integers, so I’d recommend you go learn at least in which situations you might use it.

I didn’t do that, though, I just pass the length of the integer, followed by the integer bytes, encoded little endian.

-module(interface).
-compile(export_all).
%% -export([start/0, stop/0, twice/1, sum/2, log_and_le_encode/2, encode/1]).

start () ->
    register(interface,
             spawn(fun() ->
                           process_flag(trap_exit, true),
                           Port = open_port({spawn, "./interface"}, [{packet, 2}]),
                           loop(Port)
                   end)).

stop() -> ?MODULE ! stop.
twice(X) -> call_port({twice, X}).
sum(X, Y) -> call_port({sum, X, Y}).
call_port(Msg) ->
    ?MODULE ! {call, self(), Msg},
    receive
        {?MODULE, Result} ->
            Result
    end.

loop(Port) ->
    receive
        {call, Caller, Msg} ->
            Port ! {self(), {command, encode(Msg)}},
            receive
                {Port, {data, Data}} ->
                    Caller ! {?MODULE, decode(Data)}
            end,
            loop(Port);
        stop ->
            Port ! {self(), close},
            receive
                {Port, closed} ->
                    exit(normal)
            end;
        {'EXIT', Port, Reason} ->
            exit({port_terminated, Reason})
    end.

% Integer log to know length
log_and_le_encode(N, Base) ->
    if 
        (N < Base) ->
            {0, [N]};
        (N >= Base) ->
            {LN, Repr} = log_and_le_encode(N div Base, Base),
            {LN + 1, [N rem Base | Repr]}
    end.

encode({sum, X, Y}) -> 
    {LX, RX} = log_and_le_encode(X, 256), % L -> Log, R -> Repr
    {LY, RY} = log_and_le_encode(Y, 256),
    [1, LX + 1] ++ RX ++ [LY + 1] ++ RY;
encode({twice, X}) -> 
    {LX, RX} = log_and_le_encode(X, 256),
    [2, LX + 1] ++ RX.

decode([_Size|LR]) -> 
    lists:foldr(fun 
                    (Elem, AccIn) -> 
                        AccIn * 256 + Elem 
                end, 0, LR). 

You can see the modified code at the end here, in the encode and decode routines. Note the use of foldr… When getting into functional programming, it isn’t always obvious when you can use foldl/r and this is a pretty cool example.

Things you need to know about C before writing some

First and most importantly, there is no memory management in C, and the program does not watch you to make sure that you make no mistakes. A memory error in C will crash the process with no recourse on the end of erlang or the executing program (except erlang restarting the program, as can be trivially implemented).

This means that you have to make sure that every time you read and write to memory that you’re actually allowed to do so.

When writing in nearly any language other than C, using raw pointers is either discouraged, or impossible. Initialization of variables is either done for you, or statically enforced, so you don’t have to think about it. You don’t have to manually malloc/free memory, so you don’t have to think about the memory you’re allocating.

None of this is true, so there’s a lot to learn.

In order of decreasing importance for the code we’re writing today, we have:

  • Compiling a C program
  • Pointers, data size, and casting
  • Memory and binary arithmetic
  • Little Endian/Big Endian numbers
  • File descriptors
  • The stack
  • malloc/free

Compiling a C program on linux or macos or cygwin/mingw

Honestly at this point I have no idea how to write C programs on native windows, and so I’m not going to try to instruct you how.

If you want to use windows, install cygwin/mingw/msys. I like to use https://msys2.org.

On linux, you probably have gcc installed. If not, install it with your package manager.

# on debian based systems
sudo apt install gcc
# on Fedora
sudo yum install gcc # (I think)

# if you're running any other linux, you probably know how to do this
# without me telling you

On mac, open a terminal, and type cc. You might be prompted to install xcode developer tools. Install them, then use cc for the rest of this.

On msys2, type

pacman -S gcc

To compile a C program, the command is simple:

gcc [input-files] -o output-file

Input files are typically .c files. After you’ve done this, run your program with ./output-file (or whatever you called it).

A few other flags of interest when calling gcc:

-Wall -Wextra # Show all warnings (you should probably use this, and
              # always remove all warnings from your code)
-g # add debugging symbols. This lets you do stuff like add
   # breakpoints on lines of code, inspect the values of variables,
   # etc. from a debugger
-Ox # optimization level x (0-3)

Pointers

Here is a link to an interesting article on learning memory management and pointers: https://www.joelonsoftware.com/2005/12/29/the-perils-of-javaschools-2/

Don’t give up, even if you run into trouble thinking about/using pointers. It’s worth the toil to get a good idea of how to use them.

Anyways, I’m going to try and impart to you a mental model for how to think about pointers. We’re going to be working with a section of memory, demarkated only by a pointer to its base (as if you just allocated some memory).

In the following, remember that unsigned char is a byte (why isn’t it called byte? Don’t ask.). Most of the other syntax is akin to Java.

Look at the following C code.

#include <stdlib.h>

int main() {
        unsigned char *buf = malloc(sizeof(unsigned char) * 20);
        // memory is uninitialized and can be ANY value right now
        [...]
}

In this example, in main(), we declare and initialize a pointer to the return value of malloc(20). malloc(20) allocates a memory area of size 20, and then returns a pointer to its first byte.

So let’s draw this out, ITI1120 style.

./imgs/buf.png

As can be seen, we have a 20 byte area somewhere in memory (we have no idea where, as malloc doesn’t give us any guarantees there), and we have a pointer that points to that area. Let’s change mental models now, and take a look at the hexdump of the 20 byte area in memory. (Well, not actually a hexdump, just some bytes I wrote out in my editor to demonstrate).

Note how the memory in the following picture is random garbage. Don’t assume that memory you have not set is initialized.

./imgs/buf2.png

Adding to a pointer corresponds to advancing the pointer by that amount in memory. Note that just like anything else, this doesn’t modify the pointer.

./imgs/bufadd.png

If we want to access the memory that a pointer points to, we use the * operator.

./imgs/bufstar.png

If we want to write to a location in memory, we assign to the dereference of the pointer.

./imgs/bufassign.png

Note that until now, we’ve been dealing with a pointer of type unsigned char *. The type of the pointer changes the size of the area referred to. Let’s assume that an int is 8 bytes wide (as it is on most modern 64-bit processors), then change our C program to look like this:

#include <stdlib.h>

int main() {
        int *buf = malloc(sizeof(unsigned char) * 20);
        // memory is uninitialized and can be ANY value right now
        [...]
}

Our memory now looks like this:

./imgs/newbuf.png

Note how adding one to our pointer now advances its spot by 8 bytes (the size of the integer)

./imgs/newbufplus.png

And, dereferencing the pointer, we get the value at this address.

./imgs/newbufderef.png

Note that because moden processors are little endian, the least significant byte is first in memory.

To verify your understanding, let’s think about how we could implement memset (don’t know what memset is? Let’s open up a terminal, and type man memset to pull up the manual page for memset).

./imgs/memset.png

Here’s the function signature of memset.

void *memset(void *ptr, int c, size_t n);

So, we want to set the first n bytes of the area referred to by ptr to c. How can we do this with pointers?

#include <unistd.h> // for the declaration of size_t

void *memset(void *ptr, int c, size_t n) {
        unsigned char *p = (unsigned char *)ptr;

        for (size_t i = 0; i < n; ++i) {
                *(p + i) = (unsigned char)c;
        }

        return ptr;
}

Let’s think about what we just wrote, line by line.

The first line of the function, I create a new variable p, with the type unsigned char *. We have to do this because we were passed a void *, and we need to tell C to treat the pointer as a pointer to bytes, else we don’t know the size of the memory we’re dealing with.

The for loop works the exact same as in java. We assign to the first n bytes of memory the value of c.

Note that I casted c to unsigned char to make explicit the fact that we’re receiving an integer, and assigning to a byte. This truncates the integer (so, it’s as if we took the integer value, modulo 256, and used that [or, if we took the integer value, and took the bottom 8 bits, which corresponds to a binary AND by ~0xff~]).

Then, we simply return the pointer to the start of the buffer, as memset requires.

The last bit on pointers I’ll cover today is that there’s a visual shorthand for *(ptr + i), and that is ptr[i]. So, in our memset loop, instead of writing *(p + i), we could have simply written p[i].

Fun fact, all arrays in C are actually just pointers.

Note that this is only a crash course on pointers. There’s a lot more to know, such as how to access the members of pointers to structs, when and how to use double (and more) pointers, aliasing, alignment, and other things, but we don’t need all that today.

Memory and binary arithmetic

C has a handful of bitwise operators. Here’s a list:

  • | bitwise OR
  • & bitwise AND (also dereference when used as a unary operator)
  • ^ bitwise XOR
  • >> shift right (divide by 2^n)
  • << shift left (multiply by 2^n)
  • ~~~ bitwise NOT

In C, you’ll almost always be using values which fit into your processor registers (64-bit on most modern processors), and you can do binary arithmetic on all of those (pointers, ints, chars, longs, etc).

Let a, b be as follows:

./imgs/ab.png

Then, the operations are as shown:

./imgs/abor.png ./imgs/aband.png ./imgs/abxor.png ./imgs/lrnot.png

Note that you should always match the type between the two operands to a bitwise operator. Widening and shortening of values is done, and can be unintuitive if you haven’t banged your head against it a few times (and if I just tell you the rules here, you’ll forget in five minutes). When in doubt, just explicitly cast everything.

Little Endian/Big Endian numbers

This is not difficult compared to all the rest of the stuff. Think about how to arrange multi-byte numbers in memory. Do the bytes go least-important to most-important, or the opposite?

#include <stdint.h>

unsigned char buf[] = { 0xff, 0x10, 0x00, 0x23 };

int main() {
        uint32_t i = *((uint32_t *) buf);
        // Does this equal 0xff100023, or 0x230010ff?
        // Little endian says 0x230010ff
        // Big endian says    0xff100023
        
        // also, this is a good test for your pointer knowledge. 
        // uint32_t is 4 bytes wide. What am I doing here?
}

The way to remember which is which is to think “what will I see first in memory? The big number (most important byte, highest exponent, big endian) or the little number (least important byte, lowest exponent, little endian)?

File descriptors

File descriptors are used by UNIX systems to abstract away most things in life. Devices are files, files are files, pipes are files, the network can be a file…

File descriptors have a very simple interface. You can read from a file descriptor, and you can write to a file descriptor.

To read from the file descriptor, you call the read function. Let’s pull up its man page with man 2 read (2 because it’s the second man section. Manual pages are delimited as follows:

SectionDescription
1general commands
2syscalls
3library functions, mostly the C std library
4special files (like devices)
5file formats and specs
6games and screensavers
7misc
8sysadmin commands and daemons

):

./imgs/read.png

And, here’s write’s man page. ./imgs/write.png

These manual pages tell us absolutely everything we need to know about the two commands. Isn’t that nice.

When a linux process gets started, it usually has three standard file descriptors which you can use to interact with the world. These are stdin, stdout, and stderr. Their file descriptors are defined in unistd.h as STDIN_FILENO, STDOUT_FILENO, and STDERR_FILENO. stdio.h also defines stdin, stdout, and stderr as FILE * structures which you can pass to stdio.h functions. If you don’t understand all this, that’s fine, you can just copy what I do when writing and reading from stdin and stdout.

The stack

The stack is less important than most other things because it’s mostly hidden from you as the programmer, but you have to be aware of its existence.

When a function gets called, its arguments get pushed onto the stack. What’s more, its local variables are stored on the stack.

The stack starts at the end of memory (0xffffffffffffffff) and grows downwards, towards zero.

Here’s a mental image for what it looks like:

./imgs/stack.png

A few key concepts to note:

  • The stack reuses memory. If you call a function, it decreases the address of the top of the stack, and uses that location for the local variables of the function. Then, when you return from that function, it increases the address of the stack top. When you call your next function (assuming you haven’t returned several times) it will use exactly the same memory for its local variables. There is no clearing that space to zero, and uninitialized local variables will have garbage in them from previous functions which have executed.
  • You have to take precious care about arrays stored on the stack (local arrays of fixed size in C) because if you overrun them in a loop, you will overwrite other variables on the stack. At best, this will change your program behavior. On the average, this will crash your program when it tries to return from the function (because the return address is also on the stack). In the worst case, this is (once again) an exploit vector. Google “stack overflow” or “stack smashing.” If there’s no memory protection in place, this is the easiest thing to exploit.

Also, if you’re looking for resources, the book “Hacking: the art of exploitation” (https://repo.zenk-security.com/Magazine%20E-book/Hacking-%20The%20Art%20of%20Exploitation%20(2nd%20ed.%202008)%20-%20Erickson.pdf) is probably the best intro to this kind of programming that I’ve ever found. It forces you to truly understand exactly how everything is implemented. Struggling through the book is hard, but the journey is well worth it.

malloc/free

This is at the bottom of the list because I didn’t even use malloc or free in the program which interfaces with erlang, so you could just skip this section. So much for heap memory being required to write programs…

malloc is the memory allocator you can use in C. It is not the only one, as you can use mmap, brk/sbrk, alloca, and a few others, but 99.9% of the time, you’ll be using malloc.

The malloc interface is simple. You pass it a number of bytes, and it’ll pass you a pointer to a memory buffer that you can access of that size. Do not make any assumptions about what comes before or after it in memory. If you access those addresses, your program can crash, and you’re opening yourself up to being exploited.

After you finish with a buffer in memory, you can call free on it. free takes a pointer, and will free the buffer that it is pointing to. AFAIK, you don’t need to pass free back the exact pointer that is returned by malloc, the pointer just needs to point to some part of the buffer.

You MUST only call free ONCE. Double freeing a block of memory can crash your program, and is a common exploit vector. Google “double free exploit.”

You MUST NOT access a block of memory after freeing it. Accessing a block of memory after freeing is called a “use-after-free” and is another common exploit vector. Google “use-after-free exploit.”

realloc also exists to resize existing blocks of memory, but is slightly more complicated, so just go read the man page for it if you’d like to know more about it.

Writing the C code

To interface with the port, we simply read and write to/from stdin/stdout.

The code in the book was kinda useless, because it crashed on any input greater than 256 (because it was interpreting a general number as a byte), so I reimplemented the code in the book and the C to support variable length integers.

#include <assert.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>

#define BUF_LEN 1024
#define MAX(x, y) (((x) > (y)) ? (x) : (y))

// To enable debug messages, turn this to a 1.
#define DEBUG 1

/* 
 * General notes about types:
 * - size_t is just unsigned long, and ssize_t is signed long.
 * - unsigned char -> byte.
 */

/* 
 * This function reads a command from stdin. It is mostly
 * self-explanatory, and the parts which aren't, I added comments for.
 */
ssize_t read_cmd(unsigned char *buf, size_t max_len, size_t header_len) {
        ssize_t rd = read(STDIN_FILENO, buf, header_len);
        if (rd != (ssize_t) header_len) {
                fprintf(stderr, "Could not read header of cmd, rd = %ld\n", rd);
                return -1;
        }

        // if (DEBUG) fprintf(stderr, "read header of size %ld\n", rd);

        /* The following for loop turns the header_len-long header
         * into a value for our reading (i.e. it is doing the decoding
         * of a big-endian-encoded integer */
        size_t len = 0; // length of message
        for (int i = header_len - 1, exp = 0; i >= 0; --i, ++exp) {
                len |= (buf[i] << exp * 8);
        }

        // if (DEBUG) fprintf(stderr, "header says size of buf is: %lu\n", len);

        /* The following loop reads from stdin into buf until we've
         * either filled the buffer or finished reading */
        int curr_i = 0;
        while (len > 0 && (max_len - curr_i > 0)) {
                rd = read(STDIN_FILENO, buf + curr_i, MAX(max_len - curr_i, len));
                // if (DEBUG) fprintf(stderr, "read %ld bytes from stdin\n", rd);
                if (rd <= 0) { /* error or EOF */
                        fprintf(stderr,
                                "return value of %ld while reading message\n",
                                rd);
                        return -1;
                }
                curr_i += rd;
                len -= rd;
        }
        /* return how much we've read */
        return curr_i;
}

/* 
 * This function outputs a hexdump to stderr.
 */
void hexdump(unsigned char *buf, size_t len) {
        int line_len = 8;
        for (size_t i = 0; i < len; ++i) {
                if (DEBUG) fprintf(stderr, "%x%x ", (buf[i] >> 4) & 0xf, buf[i] & 0xf);
                if ((i % line_len == 0) && (i != 0)) {
                        if (DEBUG) fprintf(stderr, "\n");
                }
        }
        if (DEBUG) fprintf(stderr, "\n");
}

/* 
 * Decodes an int starting at the pointer provided. 
 * Silently fails on len >= sizeof(unsigned long) 
 */
unsigned long decode_le(unsigned char *buf, size_t len) {
        unsigned int ret = 0;
        unsigned char *ptr = buf + len - 1;
        while (ptr >= buf) { // look, pointer arithemtic
                ret <<= 8; 
                ret |= *ptr--;
        }
        return ret;
}

/* 
 * This function should really be trashed (because it doesn't support
 * fragmentation of result. There should be a loop very similar to the
 * read loop in read_cmd here), but it does the job technically...
 */
void write_result(unsigned long x) {
        unsigned char i = 1;
        unsigned char buf[sizeof(unsigned long) + 2];
        while (x) {
                buf[i++] = x & 0xff;
                x >>= 8;
        }
        buf[0] = i - 1;
        unsigned char head[2] = {0, i};
        if (DEBUG) {
                fprintf(stderr,
                        "C program sending the following bytes to erlang "
                        "(with header):\n");
        }
        int wr = 0;
        wr = write(STDOUT_FILENO, head, 2);
        if (wr != 2) { fprintf(stderr, "failed to write result, wr = %d\n", wr); }
        for (int j = 0; j < 2; ++j) {
                if (DEBUG) fprintf(stderr, "%x%x ", (head[j] >> 4) & 0xf, head[j] & 0xf);
        }
        wr = write(STDOUT_FILENO, buf, i);
        if (wr != i) { fprintf(stderr, "failed to write result, i = %d, wr = %d\n", i, wr); }
        for (int j = 0; j < i; ++j) {
                if (DEBUG) fprintf(stderr, "%x%x ", (buf[j] >> 4) & 0xf, buf[j] & 0xf);
        }
        if (DEBUG) fprintf(stderr, "\n");
}

int main() {
        if (DEBUG) fprintf(stderr, "starting external program\n");
        unsigned char buf[BUF_LEN];

        /* This is the event loop. Read a command, figure out which
         * command is being invoked, then send the result back. */
        ssize_t rd;
        while ((rd = read_cmd(buf, BUF_LEN, 2)) >= 0) {
                if (DEBUG) {
                        fprintf(stderr, 
                                "Hexdump of bytes received by C program, minus header:\n");
                        hexdump(buf, rd);
                }

                switch (buf[0]) {
                case 1: {
                        int len_x = buf[1];
                        assert((size_t) len_x < sizeof(unsigned long));
                        int x = decode_le(buf + 2, len_x);
                        int len_y = buf[2 + len_x];
                        assert((size_t) len_y < sizeof(unsigned long));
                        int y = decode_le(buf + 3 + len_x, len_y);
                        write_result(x + y);
                } break;

                case 2: {
                        int len_x = buf[1];
                        assert((size_t) len_x < sizeof(unsigned long));
                        int x = decode_le(buf + 2, len_x);
                        write_result(x << 1);
                } break;

                default:
                        fprintf(stderr,
                                "Unrecognized function received through pipe: %d\n",
                                buf[0]);
                }
        }
}

Running it

Here you can see me running the code. I enabled the debug output, and you can see the bytes received and sent by the C process.

46> interface:start().
interface:start().
true
47> starting external program
interface:sum(11002, 1234).
interface:sum(11002, 1234).
Hexdump of bytes received by C program, minus header:
01 02 fa 2a 02 d2 04 
C program sending the following bytes to erlang (with header):
00 03 02 cc 2f 
12236
48> interface:sum(1234, 1).
interface:sum(1234, 1).
Hexdump of bytes received by C program, minus header:
01 02 d2 04 01 01 
C program sending the following bytes to erlang (with header):
00 03 02 d3 04 
1235
49> interface:twice(1234).
interface:twice(1234).
Hexdump of bytes received by C program, minus header:
02 02 d2 04 
C program sending the following bytes to erlang (with header):
00 03 02 a4 09 
2468
50> interface:twice(123589724).
interface:twice(123589724).
Hexdump of bytes received by C program, minus header:
02 04 5c d4 5d 07 
C program sending the following bytes to erlang (with header):
00 05 04 b8 a8 bb 0e 
247179448
51> 

Eighth Day

Today we’re working with files. There’s slightly more here than meets the eye, because instead of just working with files, we’re learning how to work with byte/bit streams. Erlang has great support for this, and once you get up to speed, you can write less buggy and more consise code for interacting with bytestreams. (Plus, I can imagine how this might be compiled and ran, so I haven’t ran any benchmarks, but I think it might be quite fast).

Modules you gotta know

  • file is the module that contains the normal file i/o. Opening, closing, reading, writing, ls, etc.
  • filename is for reading and writing the names of directories and files in a platform-agnostic way.
  • filelib is for more advanced and high-level file operations
  • io does the actual io (although you don’t need to use it to do file i/o, file has all you technically need). It contains routines for parsing and writing formatted data.

Reading files

There’s a few ways you can read files. Here is one.

{lets, say, i, have}.
{some, {tuples, in}, {this, file, {and_}}, id, like, to, read, [them]}.

Let’s say this is in a file called asdf, we can read the list of erlang-formatted terms within by calling file:consult

1> file:consult("./asdf").
{ok,[{lets,say,i,have},
     {some,{tuples,in},
           {this,file,{and_}},
           id,like,to,read,
           [them]}]}

More reading

If we wanted to read the terms one by one, we could use io:read.

9> {ok, S} = file:open("./asdf", read).
{ok, S} = file:open("./asdf", read).
{ok,<0.95.0>}
10> io:read(S, '').
io:read(S, '').
{ok,{lets,say,i,have}}
11> io:read(S, '').
io:read(S, '').
{ok,{some,{tuples,in},
          {this,file,{and_}},
          id,like,to,read,
          [them]}}
12> io:read(S, '').
io:read(S, '').
eof
13> file:close(S).
file:close(S).
ok

If we wanted to read the file line by line, we could use io:get_line.

15> {ok, S} = file:open("./asdf", read).
{ok, S} = file:open("./asdf", read).
{ok,<0.102.0>}
16> io:get_line(S, '').
io:get_line(S, '').
"{lets, say, i, have}.\n"
17> io:get_line(S, '').
io:get_line(S, '').
"{some, {tuples, in}, {this, file, {and_}}, id, like, to, read, [them]}."
18> io:get_line(S, '').
io:get_line(S, '').
eof

Reading with pread

You can also use pread to do random access in a file. After opening a file with raw, as shown, you can use file:pread to read a specified number of bytes starting at a specified offset.

19> {ok, F} = file:open("./asdf", [read, binary, raw]).
{ok, F} = file:open("./asdf", [read, binary, raw]).
{ok,{file_descriptor,prim_file,
                     #{handle => #Ref<0.3399782956.215089163.90801>,
                       owner => <0.81.0>,r_ahead_size => 0,
                       r_buffer => #Ref<0.3399782956.215089154.90797>}}}
20> file:pread(F, 10, 10).
{ok,<<", i, have}">>}
21> file:pread(F, 20, 10).
{ok,<<".\n{some, {">>}
22> file:pread(F, 10, 20).
{ok,<<", i, have}.\n{some, {">>}
23> file:pread(F, 15, 50).
{ok,<<"have}.\n{some, {tuples, in}, {this, file, {and_}}, ">>}
24> file:pread(F, 15, 100).
{ok,<<"have}.\n{some, {tuples, in}, {this, file, {and_}}, id, like, to, read, [them]}.">>}
25> file:pread(F, 15, 200).
{ok,<<"have}.\n{some, {tuples, in}, {this, file, {and_}}, id, like, to, read, [them]}.">>}
2

Reading and writing entire files with binaries

Erlang has a really cool built-in data structure called a Binary.

you can use file:read_file to read an entire file into a binary string atomically. This is by far the most efficient way, and so can be used to great effect.

This is how I do the file i/o in the bencode parser that I wrote earlier which you can find in ./code/torrent/benc.erl.

Writing files

The most commonly used bif to create formatted output is io:format. io:format is the ‘common’ printf just like in other languages, but with slightly different format string syntax.

You can call it as

-spec io:format(IODevice, Format, Args) -> ok.

IODevice in this case is a device which you have opened in write mode. You can also product strings by calling it without an IODevice.

Here are some format options you can use:

  • ~~n~ Write a line feed. This does the right thing on various platforms (CLRF on windows)
  • ~~p~ Pretty-print the argument
  • ~~s~ When the arg is either a string or IO List (which is a list of strings or bytes or binaries afaik), this will print it without any quotation marks.
  • ~~w~ Write data with the standard syntax. You use this to output erlang terms.

There’s more here than meets the eye, as these can also take extra arguments, but we can’t cover them all here:

Format                             Result
======                             ======
io:format("|~10s|",["abc"])        |       abc|
io:format("|~-10s|",["abc"])       |abc       |
io:format("|~10.3.+s|",["abc"])    |+++++++abc|
io:format("|~-10.10.+s|",["abc"])  |abc+++++++|
io:format("|~10.7.+s|",["abc"])    |+++abc++++|

These are copy pasted from the book. Pls don’t sue me.

file:write_file

file:write_file(FileName, IO) deserves its own section, because it is fast, atomic, and easy. File is a file, and IO is an I/O list, which is a list of binaries, integers from 0 to 255 (bytes), or other I/O lists (this includes strings). When you pass a deeply nested list, it is flattened without you having to do anything about it. This is very helpful as shown again in the bencode parser in ./code/torrent/benc.erl.

Directory operations

To operate on directories, erlang gives us list_dir, make_dir and del_dir. They all do what you’d expect.

File info

To find info about a file, we call file:read_file_info which returns {ok, Info} on success. Info then returns a #file_info struct.

To access this, I think you need to include the "kernel/include/file.hrl" to import the #file_info record.

Otherwise, it works exactly as you’d expect.

Go read the docs or source if you want to know what information is exposed here. It’s not terribly exciting, so I won’t discuss it here, but everything you’d need is here.

Copy and delete

file:copy(Source, Dest) copies Source to Dest and file:delete(Dest) deletes Dest.

Let’s clone find as an exercise.

Ninth day

We’ve reached the point where the wonderful and beautiful society of pure erlang we’ve built for ourselves becomes infected by the world of web programming.

TCP sockets are a bytestream api that you can use to communicate between any two servers on the open internet with a semi-reliable connection. You should read about them.

What’s more, we’re also learning about UDP sockets, and their implications as well. TCP sockets are slower than UDP because they guarantee non-loss and eventual total (up to network failure) ordering of messages in a connection. However, UDP has limited application, because most protocols simply require ordering of messages.

A ton of protocols use TCP under the hood as their mediator above the hardware. This includes anything that has to deliver nontrivial amounts of data or have some kind of a swarm-like ‘network’, or something that has to deliver more than miniscule amounts of data, (things that are not DNS, NTP, ICMP, etc).

To use these two tools in erlang, we use the two libraries gen_tcp for TCP and gen_udp for UDP. We’ll use erlang’s constructs to construct a bunch of different servers.

Simple fetch from a server

Let’s fetch https://google.com from some server somewhere. Through the magic of TCP (and DNS), we get a direct bytestream socket right to google’s webserver, which we can use to talk to it.

We can send anything we’d like down this socket, something important to consider. You should never write fragile code that assumes invariants about the data in play. Your code should be able to take unlimited /dev/random, at any time. (Or, more typically, specifically adversarial data. The crux is that nobody can make their serialization and deserialization fully correct, so erlang just reboots the process for you instead).

-module(lil_get).

%% Note this flag should never be used with libraries that are ever in
%% communication with anyone else, but it is supremely useful for
%% iteration and quick debugging.
-compile(export_all).

lil_get(Host) ->
    %% creates a socket to Host on port 80 of type binary without
    %% erlang's packet headers (size 0)
    {ok, Sock} = gen_tcp:connect(Host, 80, [binary, {packet, 0}]),
    %% this pattern match just crashes on error, which is by design.
    ok = gen_tcp:send(Sock, "GET / HTTP/1.0\r\n\r\n"),
    recv_http(Sock, []).

recv_http(Sock, Acc) ->
    receive
        {tcp, Sock, Bin} ->
            recv_http(Sock, [Bin|Acc]);
        {tcp_closed, Sock} ->
            list_to_binary(lists:reverse(Acc))
    end.

Analysis of the performance characteristics

Let’s talk about it a bit. Let’s look at recv_http, but change a little part of it.

recv_http(Sock, Acc) ->
    receive
        {tcp, Sock, Bin} ->
            recv_http(Socket, list_to_binary([Acc, Bin]));
        {tcp_closed -> Sock} ->
            Acc
    end.

This would be a tempting way to write this code. Arguably there’s more appeal to it, because of the way that your brain can parse it. When parsing the function, your brain notices that it has a very simple behavior. It’s a simple branching. If you have an open socket, record and recurse. If you’re done, return the data. You only ever deal with bitstrings, so there’s less type-data you need to keep in your head.

However, if you fell into this trap, you’d have gone quadratic. If you were writing C, you might have binary buffers be resizable and call a realloc, with a possibly lower overhead, but here, everything is immutable. You mustn’t write code that creates objects unnecessarily. Here, you’re creating a new, progressively-larger-and-larger object for every recursive call.

7> lil_get:lil_get("www.google.com").
<<"HTTP/1.0 200 OK\r\nDate: Mon, 29 Mar 2021 22:58:06 GMT\r\nExpires: -1\r\nCache-Control: private, max-age=0\r\nContent-Type: "...>>
8> 

Go run this, and feel how fast it is. A large chunk of google is brought to you in that little text. Isn’t that great? I’m not kidding about the speed, it’s noticeably faster than a web browser’s rendering time, and puts how slow (and yet crazy fast for the mess they have to deal with) they are in perspective.

A nice way to display this is using string:tokens:

13> string:tokens(binary_to_list(Bin), "\r\n").
["HTTP/1.0 200 OK","Date: Mon, 29 Mar 2021 22:58:27 GMT",
 "Expires: -1","Cache-Control: private, max-age=0",
 "Content-Type: text/html; charset=ISO-8859-1",
 "P3P: CP=\"This is not a P3P policy! See g.co/p3phelp for more info.\"",
 "Server: gws","X-XSS-Protection: 0",
 "X-Frame-Options: SAMEORIGIN",
 "Set-Cookie: 1P_JAR=2021-03-29-22; expires=Wed, 28-Apr-2021 22:58:27 GMT; path=/; domain=.google.com; Secure",
 "Set-Cookie: NID=212=YrRUf_zT4fvsFGTJixRst0TGp7TFWG5wnpq5USf_FE4i8chD1-X8REHIHO9_Cyo8CVYYyLpejqG4TvCOatuEcps62C7l63p-0_DJzBpkICZ1O-cYxgaLMdTQ0xNsflpdiRMNmep6WRIaXU-r4Wx5s2AS3Zt_m9tHvNKNkYoUjcs; expires=Tue, 28-Sep-2021 22:58:27 GMT; path=/; domain=.google.com; HttpOnly",
 "Accept-Ranges: none","Vary: Accept-Encoding",

And so on.

Writing a tcp listener

When I say a “tcp listener,” I’m referring to the simplest scaffolding you can built on top of Erlang to make it usable.

-module(simps). % simple server
-compile(export_all).

% Listener logic starts here
start_server(Port) ->
    %% Note how I'm hardcoding {packet, 4} to make passing data easy
    %% for now between erlang systems.
    {ok, Listener} = gen_tcp:listen(Port, [binary, {packet, 4}, {reuseaddr, true}, {active, true}]),
    spawn(fun() -> lhandler(Listener) end).


lhandler(Listener) ->
    {ok, Socket} = gen_tcp:accept(Listener),
    spawn(fun() -> lhandler(Listener) end),
    main(Socket).

% Listener logic ends here 

% Program logic starts here, only dealing with connections (Sockets)

% This part I just copied verbatim from the book, with some minor
% minor adjustments. It packs and unpacks erlang terms, which will let
% us talk to this server from another erlang node and send structured
% erlang data transparently. It's a pretty cool system, especially
% because it would be very easy to completely hide authentication as a
% layer around our actual message passing, but completely hidden from
% the "user" (the developer).
main(S) ->
    receive
        {tcp, S, Bin} ->
            %% io:format("Received binary = ~p~n", [Bin]),
            String = binary_to_term(Bin),
            io:format("received query: ~p~n", [String]),
            Response = eval(String),
            io:format("response: ~p~n", [Response]),
            gen_tcp:send(S, term_to_binary(Response)),
            main(S);
        {tcp_closed, S} ->
            io:format("socket closed~n")
    end.

eval(String) ->
    {ok, A, _} = erl_scan:string(String),
    {ok, B} = erl_parse:parse_exprs(A),
    {value, Thing, _} = erl_eval:exprs(B, []),
    Thing.

This should all be self-explanatory, but I’ll demonstrate it all in a second, after writing a suitably simple client.

-module(simpc). % simple client
-compile(export_all).

do_rpc(Port, ErlStr) when is_integer(Port), is_list(ErlStr) ->
    do_rpc("localhost", Port, ErlStr);
do_rpc(Addr, Port, ErlStr) ->
    %% Same {packet, 4} hardcode.
    {ok, S} = gen_tcp:connect(Addr, Port, [binary, {packet, 4}]),
    ok = gen_tcp:send(Socket, term_to_binary(Str)),
    receive
        {tcp, S, Bin} ->
            Res = binary_to_term(Bin),
            io:format("response: ~p~n", [Res]),
            gen_tcp:close(S)
    end.

Quick demo of server

simps:start_server(4242).
<0.110.0>
10> simpc:do_rpc(4242, "list_to_tuple([42, 32]).").
received query: "list_to_tuple([42, 32])."
response: {42,32}
server response: {42,32}
socket closed
ok
11> simpc:do_rpc(4242, "list_to_tuple([42, 32]).").
received query: "list_to_tuple([42, 32])."
response: {42,32}
server response: {42,32}
socket closed
ok
12> simpc:do_rpc(4242, "list_to_tuple([42, 32]).").
received query: "list_to_tuple([42, 32])."
response: {42,32}
server response: {42,32}
socket closed
ok
13> simpc:do_rpc(4242, "list_to_tuple([42, test]).").
received query: "list_to_tuple([42, test])."
response: {42,test}
server response: {42,test}
socket closed
ok
14> 

Active and passive sockets

You can open a socket in three different ways. Active, Active once, or Passive. You can do this by passing {active, true}, {active, false}, or {active, once} to the options to tcp connect or listen.

If you specify {active, true}, an active socket will be created. {active, false} specifies a passive socket. {active, once} creates a socket that is active but only once.

An active socket is a socket whose process is sent {tcp, Socket, Data} messages. In this way, if a rogue client overloads the server, the server has no way of mediating this flow. A passive socket only only receives data when you call gen_tcp:recv(Socket, N) to receive as close to N bytes of data as you can.

In order to have a non-blocking server, you do what we already did.

In order to have a blocking server, you will call gen_tcp:recv in a loop.

In order to have a hybrid server, you will call inet:setopts(S, [{active, once}]) after you’ve finished parsing every chunk of data, with the other code identical to what we have above.

What’s more to do

We haven’t covered UDP sockets yet, but I feel like this is enough for now.

Tenth day

Today is the day we start learning OTP. We’re not going to cover OTP in its entirety, because OTP is a massive thing with a lot of disparate parts. However, we’re going to learn about one of the most important ones, the gen_server. I think the gen_ stands for generic/general.

To better think about what we’re going to be dealing with in the next little while, think about OTP as the result of completely separating the parts of your program that do the upkeep from the “business critical” parts.

A good way to map this problem into code, it turns out, is to build a server. So, that is what we’ll do.

Let’s start by building the simplest possible server we can. This series of “servers” is taken from the book.

Simple server

-module(simple_server).
-export([start/2, rpc/2]).

start(Name, Mod) ->
    register(Name, 
             spawn(fun 
                       () -> loop(Name, Mod, Mod:init()) 
                   end)).

rpc(Name, Request) ->
    Name ! {self(), Request},
    receive {Name, Response} ->
            Response
    end.

loop(Name, Mod, State) ->
    receive 
        {From, Request} ->
            {Response, NewState} = Mod:handle(Request),
            From ! {Name, Response},
            loop(Name, Mod, NewState)
    end.

Looking at the code, we have three simple functions.

  • start starts and names the server.
  • rpc is just a standard way to talk to the server named Name.
  • loop is the server loop, the bit that gets talked to.

Note that we’ve now now created a “pluggable” server. So long as we have a module that conforms to the specification we need, we’re golden. Let’s write one of those.

-module(callbacks).
%% Note that these are the two functions that are called in
%% simple_server
-export([init/0, set/2, val/1, handle/2]).
-import(simple_server, [rpc/2]).


%% Let's write a key value store!
set(K, V) -> rpc(kvstore, {set, K, V}).
val(K) -> rpc(kvstore, {val, K}).

init() -> dict:new().
handle({set, K, V}, Dict) -> {ok, dict:store(K, V, Dict)};
handle({val, K}, Dict) -> {dict:find(K, Dict), Dict}.

Let’s run this to make sure it works:

Eshell V11.2  (abort with ^G)
1> c("/home/sean/notes/erlang/code/servers/simple_server", 
     [{outdir, "/home/sean/notes/erlang/code/servers/"}]).
{ok,simple_server}
2> c("/home/sean/notes/erlang/code/servers/callbacks", 
     [{outdir, "/home/sean/notes/erlang/code/servers/"}]).
{ok,callbacks}
3> simple_server:start(callbacks).
** exception error: undefined function simple_server:start/1
4> simple_server:start(callbacks, callbacks).
true
5> simple_server:start(kvstore, callbacks).
true
6> callbacks:set(weather, rainy).
ok
7> callbacks:val(weather).
{ok,rainy}
8> 

Looks good to me.

Takeaways

There’s a few things to take away from this. Probably most important is the fact that we’ve fully separated the concurrency from the code with this. The business code doesn’t actually encode any assumption whatsoever about the way that things are done under the hood which is nice.

Another thing to take away from this is that this is almost a direct result of having a functional program – this would not be hard to implement whatsoever in another language with first class functions.

Think for a sec about both of those things.

And now, less seriously, why is a functional program better than Java? In a functional program, we simply pass a function. In Java, we would have a GeneralServerRequestCallbackFactory which generates objects of ServerRequestCallback which implements RequestCallback strewn out in eight files. Comparatively, we have much less to worry about. Incidentally, this is why oop bad.

Second server

Alright, we’ve made a “server”. It’s not a very good one, and has a glaring flaw. What happens when it crashes? It’s simply calling handle without recourse if it goes wrong. So, let’s just change this so that when we crash the server, we crash the client.

-module(not_so_simple_server).
-export([start/2, rpc/2]).

start(Name, Mod) ->
    register(Name, 
             spawn(fun 
                       () -> loop(Name, Mod, Mod:init()) 
                   end)).

rpc(Name, Request) ->
    Name ! {self(), Request},
    receive {Name, Response} ->
            Response
    end.

loop(Name, Mod, State) ->
    receive 
        {From, Request} ->
            try Mod:handle(Request, State) of
                {Response, NewState} ->
                    From ! {Name, Response},
                    loop(Name, Mod, NewState)
            catch
                %% we don't actually care what this is
                _:Why ->
                    log_this(Name, Request, Why, State),
                    From ! {Name, crash},
                    loop(Name, Mod, State)
            end
    end.

log_this(Name, Request, Why, _State) ->
    io:format("Server ~p crashed for ~p reason when handling ~p", 
              [Name, Request, Why]).

Now, we don’t have to worry about the server crashing for no good reason.

Note that we’ve just implemented a transactional server. Throughout the process of an interaction between the client and a server, if there is an error, it is always possible to recover the original state and continue as if nothing were. This is an incredible abstraction, which is more pervasive than you might think: some examples of use is databases (where you don’t want a half-finished change mucking up your integrity), transactional memory (which is when you do concurrency without locks, and if two threads interfere with each other, they just restart. Powerful, but can lock up), etc.

Also note that this would be very difficult to implement in something like Java, unless you’re a good Java programmer who knows how to crunch through the required layers properly without undue mutable state, but, if you had a system that was originally implemented as a mutable mess, you would have almost no way of converting it into a transactional system without rewriting it, or taking memory snapshots.

Third server

All’s well and good, we’ve created a simple server that doesn’t have inconsistent state. However, I’d argue that this server isn’t adequate to provide a good “server” abstraction. For a few reasons, the most striking of which being that if we ever want to change the way that the server works, we have absolutely no way of doing that. We have to kill this server, create a new one, then bring it back up. This is unacceptable, because we’re supposed to be writing reliable systems.

Let’s implement hot code swapping. Note that hot code swapping is normally a hard thing to program. In C (which is somewhere you’d really like to have this. It reduces the write-debug-edit-whatever cycle by a LOT) you have to implement this by having a thread monitor for a change in your object file, then actually call uselib on Unix, or LoadLibrary on windows, then patch some kind of function table (note that you now have the added mental overhead that your functions are actually function pointers, and you need to make sure that the code for the function table itself is correct).

In erlang, none of that is a problem

-module(server_with_swapping).
-export([start/2, rpc/2, swap_code/2]).

start(Name, Mod) ->
    register(Name,
             spawn(fun() -> loop(Name,Mod,Mod:init()) end)).

swap_code(Name, Mod) -> rpc(Name, {swap_code, Mod}).

rpc(Name, Request) ->
    Name ! {self(), Request},
    receive
        {Name, Response} -> Response
    end.

loop(Name, Mod, State) ->
    receive
        {From, {swap_code, NewCallBackMod}} ->
            From ! {Name, ack},
            loop(Name, NewCallBackMod, State);
        {From, Request} ->
            try Mod:handle(Request, State) of
                {Response, NewState} ->
                    From ! {Name, Response},
                    loop(Name, Mod, NewState)
            catch
                %% we don't actually care what this is
                _:Why ->
                    log_this(Name, Request, Why, State),
                    From ! {Name, crash},
                    loop(Name, Mod, State)
            end
    end.

Note that the {From, Request} clause of the receive was copied from the server with transactions, and swap_code is new.

This is pretty snazzy, because we can now send messages and actually swap out the code the server is running. Because of erlang’s support for first class function types, this because easy.

Again, to stress, this is certainly something that would not be easily achievable in another language. Think about what you’d want to do when you receive a request to change the code you’re running to something new. All the data you currently have out in memory has to work with the new versions of the function, and all the data accessible to the new version has to become accessible. If you make a mistake in Java while doing this, you’ll have an almost-nondebuggable runtime exception with no recourse. In C, you might cause your system to get taken over. In here though, nothing happens, we keep looping, crash the client, and a client can send another NewCallBackMod to fix the problem.

Let’s get it working.

I just swapped out the simple_server with server_with_swapping, creating the module kv_2.

-module(kv_2).
%% Note that these are the two functions that are called in
%% simple_server
-export([init/0, set/2, val/1, handle/2]).
-import(server_with_swapping, [rpc/2]).


%% Let's write a key value store!
set(K, V) -> rpc(kvstore, {set, K, V}).
val(K) -> rpc(kvstore, {val, K}).

init() -> dict:new().
handle({set, K, V}, Dict) -> {ok, dict:store(K, V, Dict)};
handle({val, K}, Dict) -> {dict:find(K, Dict), Dict}.

Let’s say we wanted to be able to get a quick dump of what’s in memory. Say, for debug purposes. We can now write another kv_3 module that has that function:

-module(kv_3).
%% Note that these are the two functions that are called in
%% simple_server
-export([init/0, set/2, val/1, dump/0, handle/2]).
-import(server_with_swapping, [rpc/2]).

%% change here!
dump() -> rpc(kvstore, {dump}).


%% Let's write a key value store!
set(K, V) -> rpc(kvstore, {set, K, V}).
val(K) -> rpc(kvstore, {val, K}).

init() -> dict:new().
handle({set, K, V}, Dict) -> {ok, dict:store(K, V, Dict)};
handle({val, K}, Dict) -> {dict:find(K, Dict), Dict};
handle({dump}, Dict) -> {dict:to_list(Dict)}.

And, let’s test it.

Eshell V11.2  (abort with ^G)
1> c("/home/sean/notes/erlang/code/servers/server_with_swapping",
     [{outdir, "/home/sean/notes/erlang/code/servers/"}]).
{ok,server_with_swapping}
2> c("/home/sean/notes/erlang/code/servers/kv_3", 
     [{outdir, "/home/sean/notes/erlang/code/servers/"}]).
{ok,kv_3}
3> c("/home/sean/notes/erlang/code/servers/kv_2", 
     [{outdir, "/home/sean/notes/erlang/code/servers/"}]).
{ok,kv_2}
4> server_with_swapping:start(kvstore, kv_2).
true
5> kv_2:set(weather, rainy).
ok
6> kv_2:val(weather).
{ok,rainy}
7> server_with_swapping:swap_code(kvstore, kv_3).
ack
8> kv_3:val(weather).
{ok,rainy}
9> kv_2:val(weather).
{ok,rainy}
10> kv_3:dump().
[{weather,rainy}]
11> :weary:

Note: Hot code swapping works! in like… 5 minutes of coding!

The moral of the story is…

The moral of the story is that OTP is actually simpler than you thought, and is basically what we’ve been doing this whole time. With the above, we have a transactional server which does hot code swapping, which is basically everything we need.

Actually using gen_server

I kinda lied, because this isn’t exactly gen_server. Now, let’s actually see how gen_server works. In what we just did, the “interface” functions were:

  • start
  • handle
  • rpc
  • swap_code
  • init

In the actual gen_server, we use:

  • init is what you’d expect
  • handle_call is what handle was
  • handle_cast is what handle was but without a response
  • handle_info is when you receive info
  • terminate is called when the server is dying. The return value is ignored because the server is dying

We can then return a bunch of things from handle_call to “take actions” as the server.

For example, we could return {stop, normal, stopped, State} after receiving a stop message, and kill the server that way. (This is the way to kill the server, don’t do it any other way, this makes sure it halts gracefully)

Template!

Here’s a gen_server interaction template that contains all the needed functions.

%%%-------------------------------------------------------------------
%%% @author $author
%%% @copyright (C) $year, $company
%%% @doc
%%%
%%% @end
%%% Created : $fulldate
%%%-------------------------------------------------------------------
-module($basename).

-behaviour(gen_server).

%% API
-export([start_link/0]).

%% gen_server callbacks
-export([init/1,
         handle_call/3,
         handle_cast/2,
         handle_info/2,
         terminate/2,
         code_change/3]).

-define(SERVER, ?MODULE).

-record(state, {}).

%%%===================================================================
%%% API
%%%===================================================================

%%--------------------------------------------------------------------
%% @doc
%% Starts the server
%%
%% @spec start_link() -> {ok, Pid} | ignore | {error, Error}
%% @end
%%--------------------------------------------------------------------
start_link() ->
    gen_server:start_link({local, ?SERVER}, ?MODULE, [], []).

%%%===================================================================
%%% gen_server callbacks
%%%===================================================================

%%--------------------------------------------------------------------
%% @private
%% @doc
%% Initializes the server
%%
%% @spec init(Args) -> {ok, State} |
%%                     {ok, State, Timeout} |
%%                     ignore |
%%                     {stop, Reason}
%% @end
%%--------------------------------------------------------------------
init([]) ->
    {ok, #state{}}.

%%--------------------------------------------------------------------
%% @private
%% @doc
%% Handling call messages
%%
%% @spec handle_call(Request, From, State) ->
%%                                   {reply, Reply, State} |
%%                                   {reply, Reply, State, Timeout} |
%%                                   {noreply, State} |
%%                                   {noreply, State, Timeout} |
%%                                   {stop, Reason, Reply, State} |
%%                                   {stop, Reason, State}
%% @end
%%--------------------------------------------------------------------
handle_call(_Request, _From, State) ->
    Reply = ok,
    {reply, Reply, State}.

%%--------------------------------------------------------------------
%% @private
%% @doc
%% Handling cast messages
%%
%% @spec handle_cast(Msg, State) -> {noreply, State} |
%%                                  {noreply, State, Timeout} |
%%                                  {stop, Reason, State}
%% @end
%%--------------------------------------------------------------------
handle_cast(_Msg, State) ->
    {noreply, State}.

%%--------------------------------------------------------------------
%% @private
%% @doc
%% Handling all non call/cast messages
%%
%% @spec handle_info(Info, State) -> {noreply, State} |
%%                                   {noreply, State, Timeout} |
%%                                   {stop, Reason, State}
%% @end
%%--------------------------------------------------------------------
handle_info(_Info, State) ->
    {noreply, State}.

%%--------------------------------------------------------------------
%% @private
%% @doc
%% This function is called by a gen_server when it is about to
%% terminate. It should be the opposite of Module:init/1 and do any
%% necessary cleaning up. When it returns, the gen_server terminates
%% with Reason. The return value is ignored.
%%
%% @spec terminate(Reason, State) -> void()
%% @end
%%--------------------------------------------------------------------
terminate(_Reason, _State) ->
    ok.

%%--------------------------------------------------------------------
%% @private
%% @doc
%% Convert process state when code is changed
%%
%% @spec code_change(OldVsn, State, Extra) -> {ok, NewState}
%% @end
%%--------------------------------------------------------------------
code_change(_OldVsn, State, _Extra) ->
        {ok, State}.

%%%===================================================================
%%% Internal functions
%%%===================================================================

Eleventh day

Today we’re going to be building an application from the ground up using OTP. Not only will we write a server (two servers, actually), we’ll also get supervision set up, so we can have a supervisor watching for a failure in either of the two servers. After that, we’ll set up logging with the OTP logging services. And to top it all off, we’ll set up alarms. This is a pretty powerful and simple architecture, and parts of it would be used in basically every single reliable system.

After that, we’ll package it all up as an “OTP application” which will allow for the OTP system to start and stop it for us.

However, there’s a lot of overlap. Let’s start out with the “backbone,” the event handling.

Event handling

An event is just a message that is passed to a process. Let’s write an event handling module. This here is just a skeleton for what the module will be later.

Note that this code won’t actually be used, this is just an example of what an event handler should look like. OTP provides us with gen_event and stuff to do this well.

-module(event_handler).
-compile(export_all).
%% -export([]).

%% spawn a new process with a no_op handler
make(N) -> register(N, spawn(fun() -> handler(fun no_op/1) end)).

%% swaps out the current event handling function (from no_op to something else) 
add_handler(N, F) -> N ! {add, F}.

event(N, X) -> N ! {event, X}.

handler(F) ->
    receive
        {add, F1} ->
            handler(F1);
        {event, E} ->
            (catch F(E)),
            handler(F)
    end.

no_op(_) -> void.

We can run this like so:

2> event_handler:make(errors).
true
3> event_handler:event(errors, hi).
{event,hi}
4> 

Even though this did actually send a message, nothing happened, because we haven’t actually made the specific handler for errors.

Let’s make another module, which, say, controls a motor. When the motor overheats, we gotta shut it down, or else we’ll damage the hardware.

-module(motor_controller).
-export([add_event_handler/0]).

add_event_handler() ->
    event_handler:add_handler(errors, fun controller/1).

controller(too_hot) ->
    io:format("Turning off the motor~n");
controller(AnythingElse) ->
    io:format("~w Ignoring non-too_hot event ~p~n", [?MODULE, AnythingElse]).

Let’s run it.

5> motor_controller:add_event_handler().
{add,#Fun<motor_controller.0.84328656>}
6> event_handler:event(errors, hi).
motor_controller Ignoring non-too_hot event hi
{event,hi}
7> event_handler:event(errors, too_hot).
Turning off the motor
{event,too_hot}

Note that this is the same shell as I was running earlier. I just compiled it and it works, plug-and-play.

The thing to note here is that we now have a generic event handler. We’ve figured out how to start up a handler and change its behavior over time by simply calling event_handler:add_handler.

This decouples the actual event handling infrastructure from any specific handler, such as the motor error handler. This is like an extra-spiced-up version of the classic object-oriented (the good object oriented idea) “late-binding of all things.” We only do the binding on the delivery of the message, and so we can change the code at any time while the code is running.

OTP logger

OTP comes with two loggers. There’s error_logger, which was used prior to OTP 21, and logger, which was introduced in OTP 21. The book is written with error_logger, so that’s what I’ll use for now, but we can use logger instead at each point, as I’ll try to demonstrate. Let’s learn how to use it.

For us, the programmers, we can simply call the error logger to log an error.

10>error_logger:error_msg("An error has occurred\n").
=ERROR REPORT==== 14-May-2021::18:35:57.881601 ===
An error has occurred

ok
16> logger:error("this is an error").
logger:error("this is an error").
=ERROR REPORT==== 14-May-2021::19:11:30.650333 ===
this is an error
ok

How nice.

We can also use formatting to send formatted data. The semantics of the formatting is the same as using io:format.

9> error_logger:error_msg("An error ~p has occurred ~p~n", [15, 30]).
error_logger:error_msg("An error ~p has occurred ~p~n", [15, 30]).
=ERROR REPORT==== 14-May-2021::18:37:20.626188 ===
An error 15 has occurred 30

ok
15> logger:error(["this is an error", {why, "testing"}, cosmic_ray_123]).
=ERROR REPORT==== 14-May-2021::19:10:49.250956 ===
FORMAT ERROR: "~ts" - [["this is an error",{why,"testing"},cosmic_ray_123]]
ok

We can also use error_report to send a standard error report to the error logger.

%% We have a function called error_logger:error_report which we can call
%% to log the report. The semantics of the function are intuitive, but
%% hard to describe.

-spec error_logger:error_report(Report) -> ok.

%% Report is a list of items. In the report, we can either pass a
%% term, a string, or a tuple {Tag, Data}, where Tag is a term, and
%% Data is also a term.

%% here's an example of a call.
error_logger:error_report(["this is an error", {why, whynot}, cosmic_ray_error]).

Let’s call this in the shell.

10> error_logger:error_report(["this is an error", {why, whynot}, cosmic_ray_error]).
=ERROR REPORT==== 14-May-2021::18:42:44.954436 ===
    "this is an error"
    why: whynot
    cosmic_ray_error
ok
11> error_logger:error_report(["this is an error", {why, 12345}, cosmic_ray_error]).
=ERROR REPORT==== 14-May-2021::18:42:50.561243 ===
    "this is an error"
    why: 12345
    cosmic_ray_error
ok
12> error_logger:error_report(["this is an error", {why, "testing"}, cosmic_ray_error]).
=ERROR REPORT==== 14-May-2021::18:42:54.650653 ===
    "this is an error"
    why: testing
    cosmic_ray_error
ok

Here you can see that the terms can also be integers and strings and so on.

The logger is configurable, and so we can change its behavior by passing some arguments to erlang. There are a few options available to us. By default, the error logger just prints out the errors to the shell, or write all the errors to a formatted text file, or create a rotating log. A rotating log is simply a large FIFO which allows us to store the last few days’ worth of logs, and delete the older ones as we go. You can configure the rotating log to specify how many errors should be kept, and how large each individual log file should be. After specifying that, erlang handles the rest for you.

To use the loggers, we can pass some arguments to erlang as we boot:

  • erl -boot start_clean

    This is the default way to boot erlang. It’s an environment suited for development, but not much else.

  • erl -boot start_sasl

    This is the environment which is capable of running a production system. SASL stands for System Architecture Support Libraries, and takes care of error logging, overflow protection, and some more things. In particular, it provides alarm_handler, release_handler, and systools.

The configuration of this is best done from config files, because passing config as shell arguments is clunky, error prone, and annoying.

There’s a number of kinds of reports that erlang produces, but here are just a few of them:

  • Supervisor reports are issued when an OTP supervisor stops/starts a supervised process.
  • Progress reports are issued when an OTP supervisor starts/stops
  • Crash reports are made when a process started by OTP terminates with a reason other than normal or shutdown.

These are all produced automatically.

We can also call some specific logger functions. We can call logger:error, logger:warning, logger:critical, logger:info, and so on. They all take the same arguments.

Configuring the logger

Let’s write a config file to configure the logger.

In the book, it says that the following is true, but I found it not to work. I think it’s because it uses the old logging api, and my OTP is too recent for that.

[{sasl, [{sasl_error_logger, false}]}].

Here, we disable the sasl error logger. Let’s make it do something more interesting.

[{sasl, [{sasl_error_logger, {file, "/home/sean/notes/erlang/code/otp-app/reports.log"}}]}].

Now, when OTP starts/stops something, we’ll get reports in this file. Let’s configure the rotating log so that we can log errors, now.

[{sasl, [{sasl_error_logger, {file, "/home/sean/notes/erlang/code/otp-app/reports.log"}},
         {error_logger_mf_dir,"/Users/joe/error_logs"},
         %% # bytes per logfile
         {error_logger_mf_maxbytes,10485760}, % 10 MB
         %% maximum number of logfiles
         {error_logger_mf_maxfiles, 10} 
]}].

BUT, none of that actually worked for me. Instead, I used the following:

[{kernel,
  [{logger,
    [{handler, default, logger_std_h,  % {handler, HandlerId, Module,
      #{config => #{file => "logging.log"}}}  % Config}
    ]}]}].

And running this, we got what we wanted. Here, it puts the errors into logging.log.

rooty% erl -boot start_sasl -config elog1
Erlang/OTP 23 [erts-11.2] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [hipe]

Eshell V11.2  (abort with ^G)
1> logger:error("test").
ok
2> 
BREAK: (a)bort (A)bort with dump (c)ontinue (p)roc info (i)nfo
       (l)oaded (v)ersion (k)ill (D)b-tables (d)istribution
rooty% cat logging.log 
=ERROR REPORT==== 14-May-2021::19:30:54.214700 ===
test
rooty% 

I don’t know how to do the rotating log thing with the new erlang logger, so I’m going to skip it for now.

Alarm management

Let’s use the real OTP gen_event now.

-module(my_alarm_handler).
-behaviour(gen_event).

-export([init/1, code_change/3, handle_event/2, 
         handle_call/2, handle_info/2, terminate/2]).

%% init args returns {ok, State}
init(Args) ->
    io:format("*** ~p init: ~p~n", [?MODULE, Args]),
    {ok, 0}.

%% N is the state
handle_event({set_alarm, tooHot}, N) ->
    logger:error("*** turn on the fan, it's too hot!"),
    {ok, N + 1};
handle_event({clear_alarm, tooHot}, N) ->
    logger:error("*** we can turn off the fan again"),
    {ok, N};
handle_event(Event, N) ->
    logger:error("*** unknown event ~p~n", [Event]),
    {ok, N}.

%% returns {ok, Reply, State}
handle_call(_Request, N) ->
    Reply = N,
    {ok, Reply, N}.
handle_info(_Info, N) ->
    {ok, N}.

terminate(_Reason, _N) -> ok.

code_change(_OldVsn, State, _Extra) ->
    {ok, State}.

Let’s start interacting with this.

1>alarm_handler:set_alarm(tooHot).
ok.
rooty% tail ./logging.log
=INFO REPORT==== 14-May-2021::19:54:22.967619 ===
    alarm_handler: {set,tooHot}

As you can see, we can set the alarm manually with alarm_handler:set_alarm(). Let’s swap the handler now, and trip the error.

rooty% erl -boot start_sasl -config elog1                                                             
Erlang/OTP 23 [erts-11.2] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [hipe]

Eshell V11.2  (abort with ^G)
1> gen_event:swap_handler(alarm_handler, {alarm_handler, swap}, {my_alarm_handler, xyz}).
*** my_alarm_handler init: {xyz,{alarm_handler,[]}}
ok
2> alarm_handler:set_alarm(tooHot).
ok
3> 
BREAK: (a)bort (A)bort with dump (c)ontinue (p)roc info (i)nfo
       (l)oaded (v)ersion (k)ill (D)b-tables (d)istribution
^C%
rooty% tail logging.log 

=INFO REPORT==== 14-May-2021::19:54:22.967619 ===
    alarm_handler: {set,tooHot}
=ERROR REPORT==== 14-May-2021::19:57:35.938560 ===
*** turn on the fan, it's too hot!
rooty% 

Note what we did here. We started erlang with start_sasl, then we added our alarm handler as a handler for alarms (note we then saw our process print my_alarm_handler init with the arguments we passed it), and tripped the alarm tooHot, and saw that our handler was then called.

The application servers

The application we’re building today has two servers. A server which computes prime numbers, and a server which computes the areas of shapes.

Let’s write the prime server:

-module(prime_server).
-behaviour(gen_server).

-export([new_prime/1, start_link/0]).

%% here's all the gen_server stuff. You should remember this from last
%% time.
-export([init/1, handle_call/3, handle_cast/2, 
         handle_info/2, terminate/2, code_change/3]).

start_link() -> gen_server:start_link({local, ?MODULE}, ?MODULE, [], []).

%% Note 20k is a timeout
new_prime(N) ->
    gen_server:call(?MODULE, {prime, N}, 20000).

init([]) ->
    %% this ensures that terminate will be called when we halt
    process_flag(trap_exit, true),
    io:format("~p starting now~n", [?MODULE]),
    {ok, 0}.
    

handle_call({prime, K}, _From, N) ->
    {reply, make_new_prime(K), N + 1}.


handle_cast(_Msg, N) -> {noreply, N}.
handle_info(_Info, N) -> {noreply, N}.
terminate(_Reason, _N) -> io:format("~p stopping~n",[?MODULE]), ok.
code_change(_OldVsn, N, _Extra) -> {ok, N}.

make_new_prime(K) ->
    if
        K > 100 ->
            alarm_handler:set_alarm(tooHot),
            N = lib_primes:make_prime(K),
            alarm_handler:clear_alarm(tooHot),
            N;
        true ->
            lib_primes:make_prime(K)
    end.

And, let’s copy paste this and write the area server.

-module(area_server).
-behaviour(gen_server).

-export([area/1, start_link/0]).

%% here's all the gen_server stuff. You should remember this from last
%% time.
-export([init/1, handle_call/3, handle_cast/2, 
         handle_info/2, terminate/2, code_change/3]).

start_link() -> gen_server:start_link({local, ?MODULE}, ?MODULE, [], []).

%% Note 20k is a timeout
area(Thing) ->
    gen_server:call(?MODULE, {area, Thing}, 20000).

init([]) ->
    %% this ensures that terminate will be called when we halt
    process_flag(trap_exit, true),
    io:format("~p starting now~n", [?MODULE]),
    {ok, 0}.
    

handle_call({area, {square, S}}, _From, N) ->
    {reply, S * S, N + 1};
handle_call({area, {rectangle, B, H}}, _From, N) ->
    {reply, B * H, N + 1}.


handle_cast(_Msg, N) -> {noreply, N}.
handle_info(_Info, N) -> {noreply, N}.
terminate(_Reason, _N) -> io:format("~p stopping~n",[?MODULE]), ok.
code_change(_OldVsn, N, _Extra) -> {ok, N}.

Note, that there is a mistake somewhere in the code, and this mistake is something that we’re gonna fish out in a few seconds.

Let’s set up a supervision tree so that we can supervise this at runtime.

Supervision

Commonly, there are two types of trees. In both types, we have supervisors watching workers.

In the first type, if one worker crashes, we can restart that worker. This is called a one-for-one supervision tree.

In the second type, if one worker crashes, we have to restart all the workers under the supervisor. This is a one-for-all supervision tree. Both are fairly trivial to implement with erlang’s multiprocessing primitives.

Supervisors are created with OTP’s supervisor behavior. To use it, we can specify it with a function of the following form:

%% RestartStrategy is either one_for_one, or one_for_all.

%% MaxRestarts is the maximum number of crashes within Time seconds
%% that we can support. We don't want a process crashing over and over
%% and over again, so this puts a lid on it.

%% the workers are tuples which specify how to start up the workers,
%% which we'll get to in a sec

init(_) ->
    {ok, {{RestartStrategy, MaxRestarts, Time},
          [Worker1, Worker2, Worker3, ...]}}.

Let’s write the supervisor:

-module(super_supervisor).
-behaviour(supervisor).
-export([start/0, start_in_shell_for_testing/0, start_link/1, init/1]).

start() ->
    spawn(fun () ->
                  supervisor:start_link({local, ?MODULE}, ?MODULE, _Arg = [])
          end).

start_in_shell_for_testing() ->
    {ok, Pid} = supervisor:start_link({local, ?MODULE}, ?MODULE, _Arg = []),
    unlink(Pid).

start_link(Args) ->
    supervisor:start_link({local, ?MODULE}, ?MODULE, Args).

init([]) ->
    gen_event:swap_handler(alarm_handler, {alarm_handler, swap}, {my_alarm_handler, xyz}),
    {ok, {{one_for_one, start_link, []},
          [{tag1,
            {area_server, start_link, []},
            permanent, 10000, worker, [area_server]},
           {tag2,
            {prime_server, start_link, []},
            permanent, 10000, worker, [prime_server]}]}}.

Worker specifications (the stuff that we were talking about earlier and returned from the init function) are as follows:

{Tag, {Mod, Func, Args}, 
 Restart, Shutdown, Type, [Mod1]}

The Tag is just something that we can use to refer to the process.

The {MFA} tuple is the function that the supervisor will use to start the worker.

The Restart is what happens when a process crashes. A permanent process is always restarted. A transient process is restarted if it returns a non-normal exit value. A temporary process is not restarted.

Shutdown is the shutdown time. If a worker takes longer than this, it’s terminated.

Type can be either worker or supervisor. We can then construct a tree of supervisors.

The last list, mod1, is the name of the callback module if the process is either a supervisor or gen_server callback module.

Starting it up

Ok, let’s start it up.

rooty% erl -boot start_sasl -config elog1                                                                                      
Erlang/OTP 23 [erts-11.2] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [hipe]                                     

Eshell V11.2  (abort with ^G)                                                                                                  
1> super_supervisor:start_in_shell_for_testing().                                                                              
*** my_alarm_handler init: {xyz,{alarm_handler,[]}}                                                                            
area_server starting now                                                                                                       
prime_server starting now                                                                                                      
true                                                                                                                           
2> area_server:area({rectangle, 20, 3}).                                                                                       
area_server stopping                                                                                                           
area_server starting now 
** exception exit: {{function_clause,[{area_server,handle_call,
                                                   [{area,{rectangle,20,3}},
                                                    {<0.91.0>,#Ref<0.4170266090.3273392129.6924>},
                                                    0],
                                                   [{file,"/home/sean/notes/erlang/code/otp-app/area_server.erl"},
                                                    {line,24}]},
                                      {gen_server,try_handle_call,4,
                                                  [{file,"gen_server.erl"},{line,715}]},
                                      {gen_server,handle_msg,6, 
                                                  [{file,"gen_server.erl"},{line,744}]},
                                      {proc_lib,init_p_do_apply,3,
                                                [{file,"proc_lib.erl"},{line,226}]}]},
                    {gen_server,call,
                                [area_server,{area,{rectangle,20,3}},20000]}}
     in function  gen_server:call/3 (gen_server.erl, line 246)
3> area_server:area({rectongle, 20, 3}).
60
4> area_server:area({square, 10}).
100
5> prime_server:new_prime(10).
prime_server stopping
prime_server starting now
** exception exit: {{undef,[{lib_primes,make_prime,"\n",[]},
                            {prime_server,handle_call,3,
                                          [{file,"/home/sean/notes/erlang/code/otp-app/prime_server.erl"},
                                           {line,25}]},
                            {gen_server,try_handle_call,4,
                                        [{file,"gen_server.erl"},{line,715}]},
                            {gen_server,handle_msg,6,
                                        [{file,"gen_server.erl"},{line,744}]},
                            {proc_lib,init_p_do_apply,3,
                                      [{file,"proc_lib.erl"},{line,226}]}]},
                    {gen_server,call,[prime_server,{prime,10},20000]}}
     in function  gen_server:call/3 (gen_server.erl, line 246)
6>                            

Note a few things. For one, we crashed on {rectongle}, but the supervisor restarted it for us.

Note we now see a crash because we’re missing lib_primes. I got rid of the calls to lib_primes and swapped them out for just hardcoded numbers to keep the alarm functionality. You can see me running this here:

rooty% erl -boot start_sasl -config elog1
Erlang/OTP 23 [erts-11.2] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [hipe]

Eshell V11.2  (abort with ^G)
1> super_supervisor:start_in_shell_for_testing().
*** my_alarm_handler init: {xyz,{alarm_handler,[]}}
area_server starting now
prime_server starting now
true
2> prime_server:    
code_change/3  handle_call/3  handle_cast/2  handle_info/2  init/1         
module_info/0  module_info/1  new_prime/1    start_link/0   terminate/2    

2> prime_server:
code_change/3  handle_call/3  handle_cast/2  handle_info/2  init/1         
module_info/0  module_info/1  new_prime/1    start_link/0   terminate/2    

2> prime_server:new_prime(10).
{this_isnt_a_prime_lmao,123411234}
3> prime_server:new_prime(120).
{this_also_isnt_a_prime_but_its_hot,1234123}
4> 
BREAK: (a)bort (A)bort with dump (c)ontinue (p)roc info (i)nfo
       (l)oaded (v)ersion (k)ill (D)b-tables (d)istribution
^C%                                                       

And, looking at the log file, we can see:

=ERROR REPORT==== 14-May-2021::20:50:35.163871 ===
*** turn on the fan, it's too hot!
=ERROR REPORT==== 14-May-2021::20:50:35.164132 ===
*** we can turn off the fan again

Looks like everything works as it should!

Bundling it all up

Let’s do the last step to make this a fully-fledged OTP application.

To do this, we need to create something called a .app file which contains the information about the application.

{application, sellaprime,
 [{description, "The best way to sell fake prime numbers"},
  {vsn, "1.0"},
  {modules, [sellaprime_app, super_supervisor, area_server,
             prime_server, my_alarm_handler]},
  {registered,[area_server, prime_server, super_supervisor]},
  {applications, [kernel,stdlib]},
  {mod, {sellaprime_app,[]}},
  {start_phases, []}
 ]}.

And, let’s make a sellaprime_app module which handles startup and stopping.

-module(sellaprime_app).
-behaviour(application).
-export([start/2, stop/1]).
start(_Type, StartArgs) ->
    super_supervisor:start_link(StartArgs).
stop(_State) ->
    ok.

Here you can see me loading and starting the app.

rooty% erl -boot start_sasl -config elog1
Erlang/OTP 23 [erts-11.2] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [hipe]

Eshell V11.2  (abort with ^G)
1> application:load(sellaprime).
ok
2> application:start(sellaprime).
*** my_alarm_handler init: {xyz,{alarm_handler,[]}}
area_server starting now
prime_server starting now
ok
3> area_server:area({rectangle, 20, 10}).
200
4> application:loaded_applications().
[{sellaprime,"The best way to sell fake prime numbers",
             "1.0"},
 {sasl,"SASL  CXC 138 11","4.0.2"},
 {kernel,"ERTS  CXC 138 10","7.3"},
 {stdlib,"ERTS  CXC 138 10","3.14.1"}]
5>