The only way to solve this problem is to build in abstractions in your programs. We will review two such methods in Erlang. The idea of abstraction, informally, is that we will hide certain details and only provide a clean interface through which to manipulate stuff. Erlang is a "Mutually Consenting Adult Language" (read: dynamically typed with full term introspection - or more violently - unityped crap with everything in one big union type). So this abstraction is not possible in reality. On the other hand, the dialyzer can provide us with much of the necessary tooling for abstraction.
As an example of so-called modular abstraction, let us consider a small toy module:
-module(myq).
-export([push/2,
empty/0,
pop/1]).
-type t() :: {fifo, [integer()], [integer()]}.
-export_type([t/0]).
-spec empty() -> t().
-spec push(integer(), t()) -> t().
-spec pop(t()) -> 'empty' | {'value', integer(), t()}.
These are the definitions and specs of the module we are implementing. We are writing a simple queue module for a FIFO queue, based on two lists that are kept back-to-back. I am using a Standard ML / Ocaml trick here by calling the canonical type it operates on for 't'. The operations push/2 and pop/1 are used to push and pop elements to and from the queue respectively. Note we are prefixing queues by the atom 'fifo' to discriminate them from other tuples. The implementation of the queue is equally simple:
empty() -> {fifo, [], []}.
push(E, {fifo, Front, Back}) -> {fifo, Front, [E | Back]}.
pop({fifo, [E | N], Back}) -> {value, E, {fifo, N, Back}};
pop({fifo, [], []}) -> empty;
pop({fifo, [], Back}) -> pop({fifo, lists:reverse(Back), []}).
We always push to the back list and always pop from the front list. If the front list ever becomes empty, we reverse the back list to the front. Not used persistently, this queue has amortized O(1) run-time and is as such pretty fast.
The neat thing is that all operations are local to the myq module when you want to operate on queues. This abstracts away details about queues when you are using them via this module. There can much code inside such a module which is never exposed to the outside and thus we have an easier time managing the program.
There is a problem with this though, which is that the implementation of the queue is transparent. A user of the myq module can, when handed a queue, Q, of type myq:t() we can discriminate on it like this user:
-module(u).
-compile(export_all).
-spec f(myq:t()) -> myq:t().
f(Q) ->
case Q of
{fifo, [], []} ->
myq:push(7, Q);
_Otherwise ->
Q
end.
Note how we match on the queue and manipulate it. This is bad practice! If the myq module defined the representation of the queue it ought to be the only module that manipulate the internal representation of a queue. Otherwise we might lose the modularity since the representation has bled all over the place. Now, since Erlang is for mutually consenting adults, you need to make sure this data structural representation leak doesn't happen yourself. It is especially important with records. If you want modular code, avoid putting records in included header files if possible unless you are dead sure the representation won't change all of a sudden. Otherwise the record will bleed all over your code and make it harder to change stuff later on. Also changes are not module-local but in several modules. This hurts the reusability of code.However, the dialyzer has a neat trick! If we instead of
-type t() :: {fifo, [integer()], [integer()]}.
had defined the type as opaque -opaque t() :: {fifo, [integer()], [integer()]}.
Then the dialyzer will report the following when run on the code: u.erl:9: The call myq:push(7,Q::{'fifo',[],[]}) does not have an
opaque term of type myq:t() as 2nd argument
which is a warning that we are breaking the opaqueness abstraction of the myq:t() type.The Other kind of abstraction in Erlang
Languages like Haskell or ML has these kind of tricks up their sleeve in the type system. You can enforce a type to be opaque and get type errors if a user tries to dig into the structure of the representation. Since the dialyzer came later in Erlang one might wonder why one could write programs larger than a million lines of code in Erlang and get away with it when there was no enforcement of opaqueness. The answer is subtle and peculiar. Part of the answer is naturally the functional heritage of Erlang. Functional languages tend to have excellent reusability properties because the task of handling state is diminished. Also, functional code tend to be easier to maintain since it is much more data-flow oriented than control-flow oriented. But Erlang has another kind of abstraction which is pretty unique to it, namely that of a process:
If I create a process, then its internal state is not observable from the outside. The only thing I can do is to communicate with the process by protocol: I can send it a message and I can await messages from it. This makes the process abstract when viewed from the outside. The internal representation is not visible and you could completely substitute the proces for another one without the caller knowing. In Erlang this principle of process isolation is key to the abstractional facilities.
What does this mean really?
Erlang has not one, but two kinds of ways to handle large applications: You can use modules, exports of types and opaqueness constraints to hide representations. While you can break the abstraction, the dialyzer will warn you when you are doing so. This is a compile-time and program-code abstractional facility. Orthogonally to this, a process is a runtime isolation abstraction. It enforces a given protocol at run time which you must abide. It can hide the internal representation of a process. It provides an abstractional facility as well. It is also the base of fault tolerance. If a process dies, only its internal state can be directly affected. Other processes not logically bound to it can still run. It is my hunch that these two tools together is invaluable when it comes to building large Erlang programs, several hundred thousand lines of code - and get away with it!
So in conclusion: To create modular code-level functional abstractions, rely on the dialyzer to create them for you like in the queue example from above. To create a modular runtime, split your program into processes, where each process handles a concurrent task.
