Immutability is becoming a real trend in computing, from the programming language, infrastructure, to applicative and data layers. In this three articles installment we’ll see how immutability is changing the way we build system and enable a new breed of applications to be designed. But first a bit of background…
I began my career in IT with a fascination for one particular computing paradigm called TupleSpaces, invented and subsequently implemented as a language called ‘Linda’, by the brilliant David Gelernter. Gelernter developed his ideas in a forward thinking book called “Mirror Worlds, or the Day Software Put the World in a Shoebox”, a book so groundbreaking that it made him a target of a mail bomb from the Unabomber, which left him partly crippled.
As soon as I started to program I grew to care about general aesthetics of computing, and till this day I find TupleSpace one of the simplest and most powerful ways to build distributed systems. At its core it’s an associative shared memory, which lays on a very reduced set of operations: Read, Take, Write. The data model for TupleSpace as its name could hint to, is Tuples – an ordered collection of elements.
Tuples in a TupleSpace are inherently immutable. If you want to modify the value of a tuple, you have to either take (which removes) the tuple from the space and modify its content then write a fresh copy of the tuple with updated content or you can also just read a tuple and write an updated copy of it, leaving the first copy in the space and intact.
Either way you achieve, without even thinking about it, a very simple and safe way to coordinate a distributed system, and to deal with concurrency. The tuple exists – or not – but you never risk to get an outdated version of it.
This leads us to one fundamental feature of immutable systems: they allow different parts of a system to work on shared data in a thread-safe manner, and this is actually a big deal. Concurrent systems are by nature hard to build when you have shared mutable variables. Remove mutability, and your system becomes singularly simpler.
Back then (Linda, the first implementation of TupleSpace concept can be traced back to 1985), computing and storage were expensive, and immutability, though easing the conceptualization of parallel programming, was not exactly the most cost-efficient technique because instead of replacing an old value at the same memory place, you’d potentially need more memory/storage to achieve the same result.
It’s 2015 now and storage is quasi-infinite, memory is cheap, even cheap computers CPUs have multi cores. Launching a cluster of thousands of machines across several datacenters can be done in just few commands. This all makes immutability across the stack not only a possibility, but also a paradigm shift in the way we think about computer systems.
Let’s go through several layers of the applicative stacks to see examples of immutability and how it contributes to better systems. A good way is to start deep down the rabbit hole, with computing lowest software layer, programming languages.
Functional Programming and Immutability:
Let’s first look at programming languages. Object-oriented programming is inherently hard. One way or another, you end with several threads trying to access a shared mutable context. We’ve all been there gaining some gray hair as a result. It might not be coincidental that I saw the beauty in immutable data given that the first programming language I learnt was LISP, a fully functional programming language. Clojure – a Lisp dialect running on the JVM – draws a lot of its principe from Haskell which is in turn strongly focused on data immutability.
The recent rise of functional programming is, in my opinion, tied to the immutable nature of the way it deals with data.
Functional Programming definition from Wikipedia:
In computer science, functional programming is a programming paradigm—a style of building the structure and elements of computer programs—that treats computation as the evaluation of mathematical functions and avoids changing-state and mutable data. […] Eliminating side effects, i.e. changes in state that do not depend on the function inputs, can make it much easier to understand and predict the behavior of a program, which is one of the key motivations for the development of functional programming.
In an imperative language, the state of a program defines the output of a particular expression. This becomes particularly problematic and a source of headache when a shared state is manipulated by concurrent processes – either on the same machine, with threads, or worse: in a distributed system, where the “Fallacies of Distributed Computing” come into full swing.
The developer has to make sure several processes won’t access the data at the same time by resorting to a number of mechanisms like mutex. Reasoning about a shared state accessed by many processes is hard. A lot of elements have to be kept into consideration about the state of execution of the program, think about race conditions and what not.
By using immutable data structures you make sure that two processes won’t access the same data at the same time, potentially resulting in an inconsistent state. This ensures the consistency of the program is safe and safety means reliability and predictability, which are generally what you should be aiming for when developing a computer system.
Immutability, in this context, also has another interesting side consequence: it contributes to improving security by preventing the most common exploits: buffer overflow, and writing to parts of the memory the program is not supposed to.
Some functional programming languages, like Clojure, can support mutability but strongly discourage using it.
With the advent of multi-core architectures, it becomes harder and harder to reason in a multi-threaded, shared state environment. However, at the same time with the availability of cheap DRAM, keeping all the states of an object in memory becomes a reality, making today a sweet point in time to generalize the use of functional languages.
Databases and Immutability:
A database is a shared persistent view on a dataset. Databases are inherently mutable systems: programs can concurrently access data and modify it. The information contained in database tables is the result of a number of operations in time reflecting the evolution of the underlying system. For example, a user can create an account which results in a new row in the user table. In that case we’ll have the following row added to the database:
username | company | job --------------------------- romefort | ACME | Architect
If tomorrow I change my position from Architect to Head of Engineering, I’ll just execute a SQL statement to change the value for the job field. Nobody probably sees any challenges here, because most likely my job won’t be a value that is being changed concurrently by multiple processes. One should note though, that after the update operation I will have lost previous value for the job field, and I won’t be able to retrieve a history of the changes for this field. The update operation is destructive. While I modify my table row, other operations are performed on the database by multiple components, may it be other rows added to the table, or deleted from another table. And suddenly my database server crashes. Given the need for locking parts of the table to update values and create new ones, it’s extremely difficult to know what was the last state before the crash, and even harder to know if this state is *consistent*.
Data Immutability allows a new range of applications to be created: Bitcoin is for example based on Blockchain, and once a transaction has been included in a block in the blockchain (backed by the proof-of-work, a process called “mining”) it is entirely immutable and authoritative.
If we change our perspective on data and shift to see the operations in a system as a persistent series of time-ordered events that we can conceived as materialized views, we achieve a few very desirable behaviors. First, we have a full history of our data, which means that, if for any reason, we need to go back in time to analyze events on a past time window we are able to do it. This is particularly interesting to enable a new class of analytic systems, which can be built once the system is running for a while, while still having access to the whole picture. It’s difficult, up to impossible to preemptively think about everything beforehand, and most analytic systems can only be built when the historical sample of data is large enough to hint what you’re potentially looking for. A second consequence of immutability of data is that you can materialize any operation that has been made to the system as a new view, tailored to your needs.
Data immutability is at the core of the so-called Lambda Architecture, a ”generic, scalable and fault-tolerant data processing architecture”. At its core, this architecture is built around logs of data. The log is a very simple immutable structure or chronically ordered events. The log can be processed by a multitude of consumer at the same time, leaving room for both fast path and batch data processing. Technologies enabling this kind of architecture are now at a good point of maturation: Kafka can be used as the log streaming platform, and Samza to actually process the flow of data.
Storage and predictable access to physical disk space used to be an issue, which justified reusing a given space to store several versions of a value. Today storage is cheap, and SSDs disks make the write and read operations more predictable in terms of performances. Immutable Data Management is economically viable. Furthermore the technologies to store and process a complex stream of data are now a reality, and recently entered a phase of maturation. The convergence of this elements makes immutable data a reality for the enterprise applications of tomorrow.
Onto the next article…
In the second part of this series, we’ll talk about Immutable infrastructures, enabled by the rise of the cloud, Unikernels and a new type of read-only Operating Systems like CoreOS. We’ll also talk about how Docker and container technologies in general are a funding block to create predictable deployment inside immutable units.
In the third installment of this series, we’ll talk about how immutability can also change the way we develop web applications. We’ll see how React.js, Immutable.js and Flux make use of immutable data to simplify the flow of data from the backend up to what the user sees in his browser.