Copyright (c) 2022 John Krukoff
Version: 1.2.0
Authors: John Krukoff ([email protected]
).
lfiles |
llists |
llists_utils |
An Erlang/OTP library for working with iterators; an opaque type that is evaluated as a lazily evaluated list of elements. This allows for memory efficient processing of sequential data and easy composition of processing on such lazy lists.
An interface identical to the stdlib lists
module is implemented, allowing
for easy conversion between list processing code and lazy list processing
code.
This library is published to hex.pm as llists. If you're using rebar3 as your build tool, it can be added as a dependency to your rebar.config as follows:
{deps, [{llists}]}.
To use this library, it is first necessary to create an iterator. This can
either be done from an existing data structure using functions like
llists:from_list/1
or by expressing the continuation programmatically using
functions like llists:unfold/2
.
Once an iterator is constructed, it can then be evaluated an element at a time
using llists:next/1
or by using one of the many utility and evaluation
functions in llists
or llists_utils
.
llists
contains an iterator aware version of every function in the stdlib
lists
module, with guaranteed identical behaviour for side effect free and
finite iterators.
lfiles
contains file utility functions replicating functionality from the
file
and io
modules. Read functions create impure iterators that should
only be evaluated once. Write functions consume iterators and write the
produced data to a file.
An iterator is an opaque record created by the llists
module to represent a
continuation. When evaluated by llists:next/1
an iterator returns a lazy
list, represented by one of two options:
-
[]
: An empty list, signaling that no more elements are available. -
[Elem | Iterator]
: An improper list consisting of an element and an iterator to continue evaluation.
As an example task, let us calculate the mean difference between a list of
integer pairs. These pairs are stored in a file as
comma separated values, one per line. We can use the llists
module to both
read lines from the file and calculate the average lazily, thus avoiding
loading the entire file into memory during computation.
First, we need to construct the iterator:
> {ok, File} = file:open("doc/example.txt", [read]).
{ok,<0.227.0>}
> I = llists:unfold(fun(File) ->
case file:read_line(File) of
{ok, Data} ->
{Data, File};
eof ->
file:close(File),
none
end
end, File).
{iterator,#Fun<llists.2.38967554>}
Next, a loop to parse the strings and calculate the mean difference:
> F = fun Calculate(I, Sum, Count) ->
case llists:next(I) of
[] ->
Sum / Count;
[Elem | Next] ->
[A, B] = [list_to_integer(string:trim(Part)) ||
Part <- string:split(Elem, ",")],
Calculate(Next, Sum + (A - B), Count + 1)
end
end.
#Fun<erl_eval.17.3316493>
> F(I, 0, 0).
-0.42
We could also make use of the utility functions in llists
and lfiles
and
compose the same result as follows:
> {ok, File} = file:open("doc/example.txt", [read]).
{ok,<0.227.0>}
> I = lfiles:read_line(File).
{iterator,#Fun<llists.2.38967554>}
> Split = llists:map(fun (Elem) ->
string:split(Elem, ",")
end, I).
{iterator,#Fun<llists.23.38967554>}
> Integers = llists:map(fun (Parts) ->
[list_to_integer(string:trim(Part)) || Part <- Parts]
end, Split).
{iterator,#Fun<llists.23.38967554>}
> {Sum, Count} = llists:foldl(fun ([A, B], {Sum, Count}) ->
{Sum + (A - B), Count + 1}
end, {0, 0}, Integers).
{-42,100}
> file:close(File).
ok
> Sum / Count.
-0.42
In both examples, we read only a single line of the file into memory at a time.
Notice that we couldn't use llists:sum/1
and llists:length/1
here instead
of llists:foldl/3
, since our input iterator has side effects and can only be
evaluated once.
Please fork the repo and submit a PR. Tests are run automatically on the master branch by GitHub actions or can be run locally via:
make deps
make check
If a Unix environment is not available, tests can be run inside a docker container via:
docker-compose build
docker-compose run check
Documentation is autogenerated using edown and edoc via:
make doc
Development of the library should be done with an Erlang/OTP version of 25 or greater.
The library requires an Erlang/OTP version of 21 or greater to function. Earlier versions may work if functions involving maps are avoided.
Streams and lazily evaluated lists are common language constructs and much prior art exists. Scheme's SRFI-41 served as a useful design document to base this work on.
Other implementations that were used for reference:
-
Elixir's standard library Stream module.
-
The Erlang stream module from the Datum library.
-
The zlist Erlang library.
-
The infinite lists example from the Erlang documentation.