# Clojure Sequences

## Clojure Sequences

By Stephen Molitor, OCI Senior Software Engineer

MAY 2011

## Introduction

In our last SETT article on Clojure, we introduced the immutable style of programming favored by Clojure: Immutability in Clojure - Part 1, Programming Without Variables. We showed how recursion can be used to do things like looping over a collection that would ordinarily require setting mutable variables.

Traditionally, the functional programming languages Scheme and LISP that Clojure inherits from relied heavily on recursion. However we also said that recursion is not used quite as extensively in Clojure; instead, Clojure provides 'lazy sequences' which can provide a more efficient alternative to recursion in many cases.

This SETT article examines how to use lazy sequences in Clojure. We will also show how immutable collection data structures are efficiently shared via Clojure's persistent collections.

First, however, we will introduce Clojure's normal (non-lazy) sequences.

## Sequences Everywhere

In Clojure, every collection like data structure can be treated like a sequence:

• lists
• vectors
• sets
• maps

In addition, many other things can behave like sequences:

• strings
• regular expression matches
• directories
• I/O streams
• XML documents
• etc....

## Sequence Core API

A sequence supports three core functions:

first
gets the first item in a sequence
rest
gets everything after the first item in a sequence.
cons
constructs a new sequence

## Sequence Examples

In the following code examples the result of an expression is shown with the comment character ';' followed by '->' and the result, like this:

`(+ 2 2) ; -> 4`

`(+ 2 2)` is the actual Clojure expression, and `4` is the result of the expression.

### Getting the first element of a sequence

The following code creates a vector of the three integers 1, 2, 3, and then retrieves the first element ('1'):

`(first [1 2 3]) ; -> 1`

The `first` function returns nil if passed an empty sequence or nil. The `rest` function returns another sequence every element following the first element:

` (rest [1 2 3]) ; -> (2 3)`

`rest` will return an empty sequence when passed a sequence with less than two elements. The `cons` function constructs a new sequence from a single element and another sequence:

`(cons 4 [1 2 3]) ; -> (4 1 2 3)`

Building on these three core three sequence functions, `first``rest` and `cons`, Clojure's standard library provides functionality for filtering, querying and manipulating sequences similar to what one finds in the Java collections API, Guava collections, etc.

The full functionality of Clojure's sequences is outside the scope of this article, but let's take a peek at a few key functions to get the flavor of things.

Note that all of the examples that follow can be applied to any sequence like data structure in Clojure, including I/O streams, regular expression matches, XML documents, etc.

### Filtering Sequences

`(filter even? [1 2 3 4]) ; -> (2 4)`

The `filter` function takes a test function, in this case, the `even?` function, calls the function on each element in the original sequence, and returns a new (lazy) sequence containing all the elements in the original sequence for which the test function returned true.

### Querying Sequences

``` (every? even? [1 2 3 4]) ; -> false
(every? even? [2 4 6 8]) ; -> true```

The `every?` function takes a test function, in this case `even?`, and returns true if the test function returns true for every element in the sequence.

### Transforming Sequences

`(map #(* 2 %) [1 2 3]) ; -> (2 4 6)`

The `map` function takes a transformation function, calls the transformation function on each element on the sequence, and returns a new sequence of transformed elements. In this case we are using Clojure's abbreviated syntax for an anonymous function:

`#(* 2 %0) ; -> creates an anonymous function that multiplies its argument by 2`

The `#` character marks the beginning of an anonymous function definition, surrounded by parenthesis. The implicit single argument is referred to by the `%` symbol.

### Manipulating Sequences - Persistent Collections and Structural Sharing

Clojure sequences such as lists, vectors, sets and maps are immutable. So how do we 'add' items to a collection, if we can't modify the collection?

Instead of modifying the original collection, we create a new collection that contains all the elements of the original collection, plus the new element that we want to add.

Here is an example using maps:

``` (def m1 {"a" 1 "b" 2}) ; -> create map m1 with two key/value pairs
(def m2 (assoc m1 "c" 3)) ; -> 'add' new key/value pair to m1, assign to m2```

First we create map `m1` using the Clojure curly brace syntax for map literals. The `m1` has two key value pairs, "a" -> 1 and "b" -> 2. Next, we call the `assoc` function which takes three arguments: a key ("c"), a value (2), and an existing map (`m1`). The `assoc` call returns a new map containing all three key / value pairs.

A natural question arises here: Isn't this horribly inefficient? Surely we can't afford to make copies of entire collections every time we want to 'add' one element.

The answer is that Clojure's collection use a technique called 'structural sharing' to compute only the differences between two different versions of a collection and share the common bits in memory. Version control systems like Subversion or GIT use similar techniques.

Logically, a version control system stores separate copies of every version of a file. Physically, however, it only stores the diffs, and computes the full version of a file on demand. Clojure's 'persistent' collections work the same way.

Here is another example using lists. First let's create an initial list `l1` with two elements:

`(def l1 (list 1 2))`

Next let's create new two lists, `l2` and `l3` from contents of `l1` and 'adding' one extra element:

```(def l2 (cons 3 l1))
(def l3 (cons 4 l1))```

Logically, we have three separate lists. In memory however, lists `l2` and `l3` share the contents of the first list `l1`.

We can demonstrate this as follows.

First let's verify that the contents of the 'rest' of `l2` and `l3` are equivalent to the contents of `l1`:

```(= (rest l2) l1) ; -> returns true
(= (rest l3) l1) ; -> returns true```

That just proves that the `cons` function worked correctly.

But let's see if the contents are identical and therefore shared in memory:

```(identical? (rest l2) l1) ; -> returns true
(identical? (rest l3) l1) ; -> returns true```

The `identical?` tests to see if two objects point to the same region in memory. It is similar to the `==` operator in Java. As we can see, the contents of list `l1` are shared between all three lists,`l1``l2` and `l3`.

## Lazy Sequences

With the introduction to basic sequences out of the way let's look at lazy sequences.

A lazy sequence in Clojure is a virtual sequence that behaves just like a 'real' sequence except that its elements are computed on demand, lazily.

The `filter` and `map` functions demonstrated above actually return lazy sequences.

### Infinite Sequences

Since each element is realized on demand, we can even create infinite lazy sequences (as long as we don't try to realize every element in the infinite sequence).

An example of an infinite sequence would be the set of all whole numbers. Let's create a lazy sequence that produces whole numbers.

To create a lazy sequence we use the `lazy-seq` macro. The `lazy-seq` macro takes an expression that returns a sequence, and only invokes that expression as it is needed. Our first version creates a lazy sequence that increments from a starting number:

```(defn whole-numbers-from [n]
(cons n (lazy-seq (whole-numbers-from (inc n)))))```

We use the `cons` function to create a sequence whose first element is `n`, and the rest is the lazy sequence returned by recursively calling `whole-numbers-from` again, incrementing the value of `n` by one. The key is the call to `lazy-seq`. Without this the function would recur forever, until the end of time or we run out of whole numbers, whichever comes first. (Actually we would get a stack overflow error first). With the `lazy-seq` call, the information needed to generate each subsequent element is remembered, or memorized, to be produced on demand, as needed. We can retrieve a finite set of elements from our logically infinite sequence using the `take` function:

`take 3 (whole-numbers-from 0)) ; -> (0 1 2)`

We use the `cons` function to create a sequence whose first element is `n`, and the rest is the lazy sequence returned by recursively calling `whole-numbers-from` again, incrementing the value of `n`by one. The key is the call to `lazy-seq`. Without this the function would recur forever, until the end of time or we run out of whole numbers, whichever comes first. (Actually we would get a stack overflow error first). With the `lazy-seq` call, the information needed to generate each subsequent element is remembered, or memoized, to be produced on demand, as needed. We can retrieve a finite set of elements from our logically infinite sequence using the `take` function:

`(take 3 (whole-numbers-from 0)) ; -> (0 1 2)`

As each element is produced, `whole-numbers-from` is called again with the incremented value of`n`, and the information for the previous call to `whole-numbers-from` is discarded and eligible for garbage collection. Thus nothing is placed on the stack; the stack does not nest more than one level and we do not get a stack overflow, even for large values of n.

We can provide a simpler version that starts at 0 using the built in `iterate` function:

`(def whole-numbers (iterate inc 1))`

The `iterate` function takes a function and starting value as arguments, and produces a lazy sequence of `n, (f n), (f (f n))` etc. Note that we do not need to create `whole-numbers` as a function definition using `defn`. Instead we assigned the lazy sequence returned by the iterate call directly to the symbol definition `whole-numbers` using `def`. We can call our `whole-numbers`definition like this:

`(take 3 whole-numbers) ; -> (1 2 3)`

We can call standard sequence functions including `first``filter``map`, etc on our lazy sequence:

```(first whole-numbers) ; -> 1
(take 3 (map #(* 2 %) whole-numbers)) ; -> (2 4 6)```

The call to `take` is very important - we can't transform the entire set of whole numbers without running out of memory first!

## Simple Recursion, Tail Recursion, Lazy Sequences, Standard Sequence Library - A Tale of Four Fibonaccis

To review what we have covered so far, let's look at four implementations for calculating the numbers of the Fibonacci sequence.

The Fibonacci numbers are defined as follows: The first two Fibonacci numbers are 0 and 1, and each of the following numbers is the sum of the previous two. So the third Fibonacci number is 1, because 0 + 1 = 1. The fourth Fibonacci number is 2, because 1 + 1 = 2. The first ten Fibonacci numbers are 0, 1, 1, 2, 3, 5, 8, 13, 21, 34.

We will create four implementations for calculating the Fibonacci sequence.

1. The first naive implementation will use normal, non-tail recursion. It should be easy enough to follow, but will not be efficient.
2. Second, we will refactor this implementation to use tail recursion. This version will be more efficient but harder to follow.
3. Next we will implement the sequence using lazy sequences.
4. Finally we will look at an elegant implementation that uses some of the built in functions in Clojure's standard sequence library.

### 1. Simple (non-tail) Recursion

Our first, naive implementation matches the problem definition well.

`           ; naive implementation:            (defn fib [n]              (condp = n                  0 0                  1 1                  (+ (fib (- n 1)) (fib (- n 2)))))             ; call like this:            (fib 9)  ; -> 34`

The `condp` macro is a kind of switch statement. It takes the name of a two argument test function, `=` in this case, a value to use as the first argument to the test function, and a set of clauses where the first element of the clause is passed to the test function and the second part of the clause is the code to execute if the test evaluates to true.

So, our `fib` function returns 0 if `n` is 0, 1 if `n` is 1, and otherwise returns the sum of the previous two Fibonacci numbers. This fits our English language description of the Fibonacci sequence above well: the first Fibonacci number (0) is 0, the second (1) is 1, and each following number is the sum of the previous two. However, this implementation will not work for large values of `n`:

`(fib 10000000) ; -> java.lang.StackOverflowError :(`

Each call to `fib` recursively places two more calls on the stack. You will quickly overflow the JVM stack with this approach.

### 2. Tail Recursion

In the previous article in this series, we saw how to use the `recur` function to replace simple recursion with tail call recursion. Tail call recursion allows the compiler to unroll the recursion into a non-recursive loop that is more efficient.

In Clojure this is achieved by replacing the recursive function call with `recur`. The call to `recur` must be last expression evaluated in the function (thus the term 'tail call'). However, in our `fib` function the `+` function is the last expression. The two recursive calls to `fib` happen before the `+` call. If we attempt to replace the two calls to `fib`with `recur` our code will throw a `java.lang.UnsupportedOperationException`. We will have to rearrange our function to allow for just one recursive call, and that call will need to be the last expression evaluated. One approach is to introduce two accumulator values to 'remember' the two previous Fibonacci numbers on the stack:

`      (defn tail-fib            ([n] (tail-fib n 0 1))            ([n prev2 prev1]               (condp = n                   0 prev2                   1 prev1                   2 (+ prev1 prev2)                   (recur (dec n) prev1 (+ prev1 prev2)))))`

Our new function is overloaded based on the number of arguments based. If passed a single argument `n`, it calls itself with three arguments: `(tail-fib n 0 1)`. The the last two arguments, `prev1` and `prev2` are the accumulator values that 'remember' the last two Fibonacci numbers calculated. Since the first two numbers of the sequence are 0 and 1 we can hard code those two and then start accumulating, with the call to `recur`. The `recur` is the last expression of the function.. This implementation works, even for large values of `n`, but is harder to read. It does not match the problem definition as well.

### 3. Replacing Recursion with Lazy Sequences

The phrase 'Fibonacci sequence' suggests an alternative implementation: a lazy sequence that calculates each succeeding Fibonacci number on demand. This approach turns out to be fairly straightforward (once one is comfortable with lazy sequences), and efficient:

`       (def fib-seq              ((fn fib [prev1 prev2]                 (lazy-seq (cons prev1 (fib prev2 (+ prev1 prev2)))))              0 1))         (take 10 fib-seq) ; -> (0 1 1 2 3 5 8 13 21 34)        (nth fib-seq 9) ; -> 34`

The definition of `fib-seq` is similar to `whole-numbers-from` above. The `fn` call creates a little helper function, `fib`, that takes the two previously accumulated numbers in the sequence, `prev1` and `prev2`. It returns a lazy sequence whose first element is `prev1`. The following elements are the result of the `fib` helper function recursively calling itself, passing `prev2` and the sum of the previous two numbers. After creating the helper function `fib`, our sequence definition invokes `fib` with the first two values of the sequence, `0` and `1`. This is the reason for the double level of parenthesis around the `fn` call: first the function `fib` is defined, and then it is called with the arguments `0` and `1` to see our sequence. Since this recursive call is placed inside the call to `lazy-seq`, it is only invoked on demand, as needed, one element at a time. The stack does not nest and we do not run out of stack space, even for large values of `n`.

### 4. Using the Standard Sequence Library

The Clojure standard library comes with many useful functions that create lazy sequences. We have already seen two of them, `iterate` and `map`. We can use these functions to create a simpler version of the Fibonacci sequence:

`            (def fibs (map first (iterate (fn [[prev1 prev2]] [prev2 (+ prev1 prev2)]) [0 1])))             (take 10 fibs) ; -> (0 1 1 2 3 5 8 13 21 34)            (nth fibs 9) ; -> 34`

The call to `iterate` loops over pairs of the two previously accumulated Fibonacci numbers. Recall that `iterate` returns a lazy sequence. The Fibonacci sequence itself is simply the first element of each pair, so the call to `(map first...` simply translates the nested sequence of pairs into single elements, the elements of the Fibonacci sequence. The `map` function also returns a lazy sequence. This version was suggested by Christophe Grand and described in the book Programming Clojure by Stuart Halloway. This version reuses existing functions, is the simplest version, and is efficient.

## Summary

In this SETT we demonstrated how Clojure's persistent collection library uses structural sharing to allow for efficient use of immutable collections. We also showed how Clojure's lazy sequences can create efficient, virtual collections over large (even infinite) sequences. We implemented different methods of calculating the Fibonacci sequence to show how to replace simple recursion with more efficient tail recursion, how to replace tail (or simple) recursion with lazy sequences, and how the lazy sequence functions in the Clojure standard library often remove the need to implement your own lazy sequences.

## References

The Software Engineering Tech Trends (SETT) is a monthly newsletter featuring emerging trends in software engineering.