Erlang Style Concurrency

Thursday, August 10, 2006

Introduction

On an evolutionary scale of innovation from one to ten (one being Bloomberg and Citi Group, eight being Google and Cirque Du Soleil, and ten being the company you couldn't imagine in your wildest dreams), the company I work for is about a three¹. Being employed by this bastion of ingenuity affords me certain opportunities I can't get elsewhere. For example, every developer gets to interview potential candidates he might end up working with. During our last round of these interviews I met a lot of developers from various backgrounds and with different amounts of experience.

Now, our company is a typical software company. We think that developing our, frankly trivial, software requires cream of the crop developers and our upper management continuously boasts about the tremendous amount of talent it has assembled in our office. Of course all our developers are average. We have no Einsteins. There are no visionaries that change the face of the industry. The halls are filled with the burden of monotony rather than the energy of innovation. Our developers are familiar only with mainstream technologies, watered down and sanitized for the general public. Naturally potential candidates that come in for interviews come from a similar breed.

Given this state of affairs I wasn't too surprised when candidate after candidate failed to arrive at relatively simple conclusions. Everyone likes to pride themselves as outside the box thinkers but it turns out that very few people are actually capable of thinking even the most innocent thoughts if they didn't hear about them from some authority (either the mainstream collectively referred to as "they" or some person perceived as an expert). In the case I'm about to describe, the difference between me and the people I've interviewed was that I was fortunate enough to stumble upon the right resources supported by a reasonably persistent group of advocates. I can only suggest to try and find the very best and learn from them. I'm in the process of turning this habit into a religious ceremony I try to perform every chance I get.

So what was the big question I asked that people couldn't properly handle? It was actually pretty simple. "Please describe some of the challenges you've encountered while working with multithreaded software". That's it. Every single time I uttered this phrase I received the same answer: "Deadlocks and race conditions". It was always a dead end from there. I tried to lead people, gave suggestions, gave them the answer straight out. All I got was weird looks. The following is a one act play that pretty much sums up where my question went every single time.

ME:
How do you deal with these problems?
CANDIDATE:
I make sure to lock everything and take special care to
lock and unlock things in the same order.
ME:
So why do you keep encountering these problems, then?
CANDIDATE:
Because humans inevitably make mistakes.
ME:
Couldn't your solution to the problem account for that?
CANDIDATE:
Unfortunately I don't know of any automatic tools that
handle this.
ME:
Why not just eliminate locks all together?
The candidate greets me with a blank stare.
CANDIDATE:
How would I ensure two threads don't modify the same
data simultaneously?
ME:
Don't share data between threads.
The candidate pauses for even more time and stares at me.
CANDIDATE:
How would threads communicate, then?
ME:
How does your web browser communicate with the web server?
The candidate seems to make up his mind and looks at me as if I'm
completely insane.

I never got the chance to explain in detail what exactly I meant. There were many other things to get to in the half hour we were allotted for the interview. I limited myself to suggesting the candidate look up Erlang and moved on. This article is my belated attempt to explain Erlang Style Concurrency. Better late than never.

Historical Background

It is said that the low hanging fruit is picked early. The field of Computer Science is consistent with this saying. A few smart people quickly figured out what could and could not be computed before computers were even invented. Once the hardware came along the best discoveries of computer science didn't take a long time to follow. The high hanging fruit proved to be much harder to pick. MIT AI laboratories have been working on solutions to some rather trivial (from layman's perspective) AI problems and with some notable exceptions (also discovered early) have produced nothing groundbreaking in more than thirty years.

Much of what we know about concurrent programming grows on the lower branches. A short recourse to Google during lunch break reveals that almost all important work on concurrency has been done early in Computer Science history. In 1971 Edsger Dijkstra posed and solved what was later referred to as the Dining Philosophers Problem. His solution involved a construct commonly known as a semaphore. For the next few years the problem has been restated and solved in fundamentally different ways by a number of computer scientists. Everything after that was commentary.

Dijkstra's solution is often referred to as shared state concurrency. Surprisingly, under this model the threads must share common state among each other and can only access it via some form of locking primitives. Many of the other solutions didn't involve sharing state. One such solution is referred to as message passing concurrency - a model in which threads share no state and communicate with each other via asynchronous messaging. There were others, each with its advantages and disadvantages. Shared state concurrency gave the most low level control to the programmer and if used properly was the most efficient. On the other hand message passing concurrency provided high level mechanisms for thread communication but afforded little low level control to the programmer.

Of course in practice concurrency models mean little without a programming language and an operating system. In the early 1970s Dennis Ritchie designed C for a rewrite of Unix. C provided a thin, portable, reasonably expressive layer over the assembly language. Due to its initial purpose C included no features that couldn't easily be ported across architectures. It included no concurrency mechanisms. Unix picked up where C left off. It needed to provide portable and reasonably efficient concurrency primitives. Dijkstra's solution fell right in place. Eventually C++ followed in C's footsteps and Microsoft Windows followed in Unix's. Vast majority of programmers were familiar only with these systems and knew little (if anything) about other languages that implemented concurrency differently.

In 1995 Sun Microsystems took a heroic step and introduced Java, a language without pointers, to the mainstream. The rationale was that vast majority of C++ bugs were in some way related to pointers, and so, the solution was to retire pointers into the vast realms of history. The idea was met with a healthy degree of skepticism. People couldn't believe this will really work. It did. The idea, of course, wasn't new. By the time Sun came out with Java, Lisp machines already saw their peak and died (yes, nothing new has been invented since Lisp). There was Smalltalk and ML. There were others. None of that mattered - to the general public Java seemed innovative. Sun didn't even have to try and hide the sources of their creativity - no one cared.

For many people Java was too innovative. With lack of pointers, introduction of a virtual machine, garbage collection, and a wealth of other features that slowed mainstream adoption, a little more innovation would tip the language over the invisible edge - Java would be shrugged off as too "academic" and quickly forgotten. High level, innovative concurrency model could wait. Sun was betting its business on Java. It couldn't afford to take too much risk. So Java designers stopped there. Aside from JIT no more innovative features have ever made it into future Java releases. We can only infer that Sun doesn't think the public is ready for a more advanced language.

Not everyone was as cautious as Sun. In 1993, two years before Java was released, Erlang Systems was founded. A result of almost seven years of research, Erlang looked like a language designed by people who weren't afraid of too much innovation. Erlang had really innovative features. Among them was a message passing concurrency model, many thousands of concurrent threads on commodity hardware, transparent and scalable distributed programming. Ericsson engineers started using Erlang to program telecommunications switches. This is software that has some of the lowest downtime in the world. Today Erlang is used for everything from Ericsson's switches to instant messaging servers to airport traffic control systems. It is also the subject of this article.

Eliminating The Problem

Let's redesign Java with message passing concurrency in mind. Throughout the article we'll use familiar syntax and principles. If you're interesting in learning how Erlang actually implements these principles you'll find plenty of resources with your favorite search engine (let's pretend for the moment that there is more than one good search engine).

Erlang is to locks what Java is to pointers. Erlang designers realized that Dijkstra style locks are a low level mechanism that doesn't belong in a high level language. Locks were the first to go. So let's remove all references to locks from Java. The synchronize keyword goes. So do notify/wait functions. All we're left with are pure threads.

Of course we now have a problem. What happens if two threads try to write to the same object? Sooner or later this will result in a race condition. Let's take care of this as well. From this moment on Java threads start with separate heaps. Every time you spawn a thread a new heap is created and associated with that thread. All objects instantiated in a particular thread will reside in the heap of that thread. There is no way for a thread to access an object outside of its heap. This means that threads can't share references to objects. In modern operating systems a thread is simply a code path that can be executed concurrently with other code paths. A process is a collection of threads and other resources (including program state). We've now erased the distinction! Our threads can no longer share resources, including memory. We can rename threads to avoid confusion. From now on we'll call them processes.

We can now spawn "virtual" processes by creating threads that have different heaps (and therefore different address spaces). This system gives us a clean slate: our programs can no longer have deadlocks and race conditions. But we can't do anything complex either, our threads have no means to communicate. Let's fill this gap with high level communication primitives!

Filling The Void

A Java Thread class (from here on Process class) has a static currentThread() (from here on currentProcess()) member function. At any given time you could type Process.currentProcess() and at runtime you'll get a reference to an instance of a Process object for the process that executed that line. It would be nice if this object would provide us with some ability to safely communicate with other processes. We can't share memory, but we want to be able to send a message. Let's add a void sendMessage(Process receiverProcess, Object message) function to the Process class. At any given time we should be able to write this:

Process receiverProcess = ...;
Object message = new SomeMessage();
Process.currentProcess().sendMessage(receiverProcess, message);

We just let sendMessage know which process we want the message to go to, pass it any object that the process on the other end will receive and voila, we're done. But what about the receiving end? Well, if we added sendMessage() to send, we should add receiveMessage() to receive:

Object message = Process.currentProcess().receiveMessage();

The receiveMessage() function will block until the process has any new messages, at which point it will return the object passed by the sender to sendMessage(). Pretty simple, except what happens if more messages come in while the process is doing something? Well, we'll take the most obvious solution. Upon creation each process will get its very own heap and its very own queue. When someone calls sendMessage() it will simply append a message to the receiver's queue. A process will be able to respond to messages in the following manner:

while(true) {
    Object msg = Process.currentProcess().receiveMessage();
    if(msg instanceOf QuitMessage) {
        break;
    } else {
        // do something
    }
}

Pretty simple, and conceptually very similar to Windows messages.

If you've been paying attention you'll notice that we have a small problem. Our sender passes a reference to an object to sendMessage() which is then added to the receiver's queue. When the receiver will get the message it will have a reference to an object in another heap - no good since he can't access it! Our language machinery will have to take care of this problem. If you think about it long enough (I did) you'll realize that the only good way to get around this problem is for sendMessage() to serialize the message to a stream of bytes. The bytes would then be pushed on the queue of the receiver, and receiveMessage() function would deserialize the object. We, therefore, add one restriction: messages must be serializable. Our final version of sendMessage() is this:

void sendMessage(Process receiverProcess,
                 ISerializable message);

Inadvertently, we just gained another benefit. With some more behind-the-scenes work from our framework, our applications can transparently send messages to processes on remote machines. Our Process class can easily point to a process located half way around the globe. Since sendMessage() serializes the object we pass to it anyway, abstracting the transport protocol (memory, wire, etc.) is trivial in this context. We can now build distributed applications on top of our framework without any extra work (this really works in practice, you don't have to feel guilty about wanting to kill little puppies after dealing with J2EE style initial contexts all day long.) This is a huge benefit because every successful concurrent server will sooner or later have to become distributed since a single machine can't scale to infinity.

Distributed Processes

In the previous section we missed one important question. Should sendMessage() block until the message is received on the other end or should it return immediately? In other words, should the message passing mechanism behave synchronously or should we prefer asynchronous communication? If all our processes ran on the same machine the answer would be simple: it doesn't matter. It would be like experimenting with forces in a lab: you could use Newton's or Einstein's equations and you'll never see a difference. However, the minute your world becomes bigger all the little coefficients Newton skillfully avoided become important. Suddenly the safe, efficient, state of the art spaceship you've designed for cruising the stars becomes an unstable, slow piece of junk. The thing with concurrency is that sooner or later your world will become bigger. No matter how fast your server is, if you get enough clients you'll have to run another instance to handle the load. And more often than not these servers must communicate in order to avoid fragmentation (remember when Napster servers weren't connected and you couldn't get mp3s from your friends because they were logged in to another node?)

If you're familiar with DCOM or OMG CORBA (could they have picked a better acronym?) or RMI or any other "enterprise" caliber distributed invocation abomination you should recognize the Newton story. Despite the fact that nobody on this planet remembers (or cares to remember) the five million steps necessary for the ritual of creating a CORBA object, if you actually manage to follow the steps and set everything up on your machine, it works pretty well. If you're lucky, it might even work across your local network. Then you deploy your application in multiple data centers, get more than three concurrent users, and everything falls apart. You spend a week trying to figure out why you get random remote exceptions, and of course you fail. Then you settle for the next best thing and try to handle them properly. Of course you can't do that either. Finally your boss's boss calls up enterprise support and gets consultants on site. After months of tuning, and optimizing, and rearchitecting, and hundreds of thousands of dollars the application still doesn't work and the project is placed on "temporary" hold while the old COBOL system stays in operation "for a few more years". Before the consultants leave they tell you that distributed, concurrent applications are extremely complicated and you didn't architect the application correctly, and you're left wondering why you didn't just use sockets.

This doesn't happen with Erlang. You can develop your application on a single machine and when you finally deploy it into a real world environment distributed across the globe and connected via the internet it continues to work! The things that made Erlang good are exact same things that made CORBA terrible because they were butchered by CORBA engineers. Let's look at them here.

Ease of use. Writing programs in Erlang is simple. You don't have to spend two days just to figure out how to create a "hello world" application. It takes three seconds to send a message in Erlang. You end up spending almost all of your time working on the actual problem you're trying to solve rather than spending it on "glue" tasks like creating CORBA objects, locking and unlocking shared data, finding and resolving deadlocks, etc. You don't have to think about marshalling, and monikers, and apartments, and caches, and a million other concepts. If something doesn't work the way you expect it to in your Erlang application, you can easily find out why. You don't have to hit the books just to understand what exactly the tools have generated for you behind the scenes, how it works, and why it's failing. You can just debug your application as you normally would.
Managing state. Erlang is an inherently stateless system. If a process fails, another process handles the failure event and corrects the problem. Processes are expected to fail and state is managed explicitly. This makes Erlang incredibly resilient to failure. Contrast that with a CORBA system where your application isn't supposed to know anything about the object except the interface. If there is a network failure or the machine on which the remote object resides crashes, you're stuck with an error you don't know how to handle because you never expected person.getName() to throw an exception.
Asynchronous communication. When an application makes a synchronous call, by definition it blocks. In a highly concurrent distributed system where network transport is the main performance bottleneck this means that most of the time the processes are blocked waiting for the response from other processes to come. And what if the response never comes due to a severed network cable or an infinite loop on the other side? Are you prepared to handle plain vanilla methods throwing timeout exceptions? These problems spell a disaster for a high throughput fault tolerant system.

In Erlang all communication is done asynchronously. You send off your message and immediately move on with your life. If a remote node fails, it doesn't compromise any other nodes in any way. Your handlers interested in the failure register the crash, correct the problem (for example by restarting the node), and your application continues to work. No more obscure exceptions to ponder about while looking at a production log. Failure is no longer a mystery, it is an expected condition with proper safeguards built into the design.

Lightweight Threads

The "Erlang" approach has one significant problem: it encourages the use of processes. If you start coding an application in Erlang (or in our framework) you'll find yourself using processes more and more often. In many cases you'll have no way around it - you'll need to create extra processes that manage access to information because it cannot otherwise be freely shared. At other times you'll need to create processes that watch other processes and take appropriate action if they fail. Often times creating processes will simply be very convenient - it will solve a problem in a concise and elegant manner. For a very large set of problems a standard Erlang solution is to create a process.

These days every third grader knows that processes are expensive. Ok, so we're not using real processes, we're using threads, but if you constantly create, destroy, and switch between thousands of threads you'll bring any system to a crawl. Threads are expensive for a number of reasons. They take a long time and a lot of memory to create because the operating system needs to set up many things (the stack, internal data structures, etc.) for them to work. They take a long time to destroy because all the resources they consumed need to be freed. And they take a long time to switch between because unloading registers, storing them, loading them back, and flipping the stacks is complicated business.

Using native threads in Erlang is impractical. If we want our applications to work and scale we need to be able to freely create tens of thousands of threads on commodity hardware without worrying about slowing our machine. The natural way to go is for the runtime to implement a custom lightweight process system. This isn't a new concept. At some point even Java implemented this kind of threading. Early JVMs used to call such threads "green threads".

Erlang processes are lightweight threads. They're very cheap to start up and destroy and are very fast to switch between because under the hood they're simply functions. A typical Erlang system running on a modern desktop computer can switch between many tens of thousands such processes. Processes are switched every couple of dozen function calls which makes switches less granular but saves a tremendous amount of time normally wasted on context switching. This works out beautifully for Erlang servers. Go ahead, tell your boss he didn't need to buy that three hundred thousand Sun box loaded with a two hundred thousand latest and greatest Weblogic server. Who would have thought a mid-range Dell server with FreeBSD and a free Erlang installation would do just fine?

What's next?

You can learn more about Erlang on its website. I encourage everyone to download the runtime and have some fun. Try creating a simple chat server and clients. Even if you can't use Erlang at work you can apply some of the principles behind it to concurrent software written in any programming language. If you're ever responsible for choosing a technology for a highly scalable, concurrent, distributed application, and you're looking for an edge over your competitors, give Erlang a try. It's more effective than outsourcing and it's free.

Comments?

If you have any questions, comments, or suggestions, please drop a note at . I'll be glad to hear your feedback.

¹Believe it or not this is a praise to my management rather than a criticism. Considering the environment the company operates in, they're actually doing far better than one would expect in terms of innovation.