As you may have heard in the upcoming 0.12.0 release Elixir’s enumerators gained some new features. In this blog post I’ll explain what’s new, what it enables and how it works.

For those of you who use the development version of Elixir these changes are already available. For the exact differences in code you can look at the relevant pull request.

A recap of enumerators, and some terminology

The basic idea of enumerators is that you traverse some data structure or resource (lines from a file) by putting the thing that is traversed in control. That is if you’re reading from a file you have a loop that reads lines from a file and for each line calls a function. Just calling a function isn’t all that useful for most tasks as there’d be no way to remember previous lines (ugly hacks aside), so some accumulator value is passed to the function and a new accumulator is returned by it.

For example here’s how you can count the total length of strings in a list.

``````Enumerable.reduce(l, 0, fn x, acc -> String.length(x) + acc end)
``````

Often the actual call to `Enumerable.reduce/3` is hidden inside another function. Say that we want to define a `sum` function. The usual way is to write it like this:

``````def sum(coll) do
Enumerable.reduce(coll, 0, fn x, acc -> x + acc end)
end
``````

This could get called as `Enum.map(1..10, &(&1 * &1)) |> sum()` to get the sum of squares. Desugaring this means `sum(Enum.map(1..10, &(&1 * &1)))`.

The general pattern is this:

``````def outer_function(coll, ...) do
...
Enumerable.reduce(coll, initial_consumer_acc, consumer)
...
end

something_that_returns_an_enumerable(...) |> outer_function(...)
``````

You’ll notice the slightly uncommon terminology of “outer function” and “consumer” (normally called an “iteratee”). That’s intentional, naming an iteratee a consumer better reflects that it consumes values.

Along the same lines I call the reduce function for a specific enumerable a producer, it produces values which are given to a consumer.

The outer function is the function to which the enumerable is passed. Syntactically it looks like this is the consumer, but it’s really a function that combines the producer and the consumer. For simple consumers (say ```fn x, acc -> length(x) + acc end```) the consumer will often be written directly in the source text of the outer function, but let’s try to keep those concepts distinguished.

Two issues with classic Elixir enumerators

Enumerators are great, but they have their limitations. One issue is that it’s not possible to define a function that only returns at most 3 elements without traversing all elements or using ugly tricks such as `throw` (with a `try...catch` construct in the outer function). The `throw` trick is used in `Enum` and `Stream` to implement functions such as `Enum.take/2` and `Stream.take_while/2`. It works, but it’s not what I’d call stylish.

A bigger problem, that doesn’t have a workaround, is that there’s no way to interleave two enumerables. That is, it’s not possible to define a function that for two enumerables `A` and `B` returns a list `[A1, B1, A2, B2, A3, ...]` (where `A1` is the first element of A) without first traversing both lists and then interleaving the collected values. Interleaving is important because it’s the basis of a zip function. Without interleaving you cannot implement `Stream.zip/2`.

The underlying problem, in both cases, is that the producer is fully in control. The producer simply pushes out as many elements to the consumer as it wants and then says “I’m done”. There’s no way aside from `throw/raise` for a consumer to tell a producer “stop producing”. There is definitely no way to tell a producer “stop for now but be prepared to continue where you left off later”.

Power to the consumer!

At CodeMeshIO José Valim and Jessica Kerr sat down and discussed this problem. They came up with a solution inspired by a Monad.Reader article (third article). It’s an elegant extension of the old system, based on a simple idea. Instead of returning only an accumulator at every step (for every produced value) the consumer returns a combination of an accumulator and an instruction to the producer. Three instructions are available:

• `:cont` - Keep producing.
• `:halt` - Stop producing.
• `:suspend` - Temporarily stop producing.

A consumer that always returns `:cont` makes the producer behave exactly the same as in the old system. A consumer may return `:halt` to have the producer terminate earlier than it normally would.

The real magic is in `:suspend` though. It tells a producer to return the accumulator and a continuation function.

``````{ :suspended, n_, cont } = Enumerable.reduce(1..5, { :cont, 0 }, fn x, n ->
if x == 3 do
{ :suspend, n }
else
{ :cont, n + x }
end
end)
``````

After running this code `n_` will be `3` (1 + 2) and `cont` will be a function. We’ll get back to `cont` in a minute but first take a look at some of the new elements here. The initial accumulator has an instruction as well, so you could suspend or halt a producer immediately, if you really want to. The value passed to the consumer (`n`) does not contain the instruction. The return value of the producer also has a symbol in it. Like with the instructions of consumers there are three possible values:

• `:done` - Completed normally.
• `:halted` - Consumer returned a `:halt` instruction.
• `:suspended` - Consumer return a `:suspend` instruction.

Together with the other values returned the possible return values from a producer are ```{ :done, acc } | { :halted, acc } | { :suspended, acc, continuation }```.

Back to the continuation. A continuation is a function that given an accumulator returns a new producer result. In other words it’s a way to swap out the accumulator but keep the same producer in the same state.

Implementing `interleave`

Using the power of suspension it is now possible to create an interleave function.

``````defmodule Interleave do
def interleave(a, b) do
step = fn x, acc -> { :suspend, [x | acc] } end
af = &Enumerable.reduce(a, &1, step)
bf = &Enumerable.reduce(b, &1, step)
do_interleave(af, bf, []) |> :lists.reverse()
end

defp do_interleave(a, b, acc) do
case a.({ :cont, acc }) do
{ :suspended, acc, a } ->
case b.({ :cont, acc }) do
{ :suspended, acc, b } ->
do_interleave(a, b, acc)
{ :halted, acc } ->
acc
{ :done, acc } ->
finish_interleave(a, acc)
end
{ :halted, acc } ->
acc
{ :done, acc } ->
finish_interleave(b, acc)
end
end

defp finish_interleave(a_or_b, acc) do
case a_or_b.({ :cont, acc }) do
{ :suspended, acc, a_or_b } ->
finish_interleave(a_or_b, acc)
{ _, acc } ->
acc
end
end
end

Interleave.interleave([1, 2], [:a, :b, :c, :d])
#=> [1, :a, 2, :b, :c, :d]
``````

Lets go through this step by step. The main `interleave` function first partially applies `Enumerable.reduce/3` to get function values that work just like the continuations. This makes things easier for `do_interleave`.

The `do_interleave` function first calls `a` (`af` from `interleave`) with the `step` function so that the available element of `a` gets added to the accumulator and `a` immediately suspends afterwards. Then the same is done for `b`. If either producer is done all the remaining elements of the other get added to the accumulator list.

Note that `acc` is sometimes used to mean a tuple like `{ :cont, x }` and sometimes the accumulator value proper. It’s a bit confusing, yes.

This example shows that through clever combination of an outer function (`do_interleave`) and an inner function `step` two producers can be interleaved.

Conclusion

The new system of enumerators certainly makes things a bit more complicated but also adds power. I suspect many interesting and “interesting” functions can be built on top of it.