## Generators, iterators, control and continuations

- March 3, 2013
- Last updated on 2013/07/10

Batteries contributors recently started a new discussion on the Enum module, whose purpose is to provide an good intermediate representation for conversion between data-structures, but whose implementation is perceived as too complicated and not well-understood enough. Going back to fundamentals (forgetting about half the features of Enum), what is the ultimate intermediate traversal interface?

There are two obvious choices in an effectful setting: a *generator*, that is a "next-element" function of type `(unit -> 'a)`

that generates the next element each time it is called, or an exception to signal end of input; or an *iterator*, a fold-function of type `('a -> unit) -> unit`

, that iterates a consumer function through all elements of the list.

Both are suitable as an abstract representation of any sequence (the second is precisely the basis for Simon Cruanes' Sequence library), and their are at opposite ends of the *control* spectrum: generators give control to the *consumer* of the sequence (she decides when to call the generator function), while iterators give control to the *producer* of the sequence (she decides when to call the iterated function). It's easy to transform a structure into a iter (`to_iter`

), or to build a structure from a next (`of_next`

), because you have the control. But as a library writer, you should also write the without-control conversions (`of_iter`

and `to_next`

), despite them being slower and harder to implement, so that your users can live the easy in-control life!

Finally, continuation capture is a well-known technique to *invert* control, letting you write code in "direct style", as if you were in control, when you're not. In this post, I will demonstrate how you can start from a conversion function that has control and, by systematic source-to-source transformations of conversion to continuation-passing-style (CPS) and defunctionalization, obtain a conversion function that works without the control.

We are interested in datatypes that support the following interface:

```
type 'a iter = ('a -> unit) -> unit
type 'a gen = unit -> 'a
module type Data = sig
type 'a t
val of_iter : 'a iter -> 'a t
val to_iter : 'a t -> 'a iter
val of_gen : 'a gen -> 'a t
val to_gen : 'a t -> 'a gen
end
```

Note that there is no support for predicting the size of the data structure: those will not be good interfaces for fixed-size structures such as `array`

. We will use lists as a warm-ups, and then concentrate on binary trees as a structure with more complicated traversal and construction patterns.

#### Lists as a warm-up

```
module List : Data with type 'a t = 'a list
= struct
type 'a t = 'a list
(* producer, with control *)
let to_iter li = fun f ->
let rec loop = function
| [] -> ()
| x :: xs -> f x; loop xs
in loop li
(* producer, without control *)
let to_gen li =
let st = ref li in
fun () -> match !st with
| [] -> raise Exit
| x::xs -> st := xs; x
(* consumer, with control *)
let of_gen gen =
let st = ref [] in
try while true do st := gen () :: !st done; raise Exit
with Exit -> List.rev !st
(* consumer, without control *)
let of_iter iter =
let st = ref [] in
iter (fun x -> st := x :: !st);
List.rev !st
end
```

Lists are no surprise. Both consumers, with or without control, are extremely simple to write. For producers, we would argue that `to_iter`

is more convenient to express, and more elegant, than `to_gen`

, but the difference is again small.

I did a bit of shallow performance testing, and without surprise the version with control was always faster than the version without control. I also tested the compositions `of_gen (to_gen li)`

and `of_seq (to_seq li)`

and the `gen`

versions, with the simpler types, seems slightly faster -- but it's hard to get anything meaningful with code so simple.

#### A generator for binary trees

The real meat of this post is the work on binary trees. Iterating on binary trees is easy, and will serve a good example of continuation-passing-style (CPS) transform. Building trees from sequences is more subtle, and will make for a more sophisticated application of the same principles.

We use binary trees with data on the nodes, and empty leaves.

```
type 'a tree = Leaf | Node of 'a tree * 'a * 'a tree
```

Do you know the trick to create a tree of `O(2^n)`

nodes in `O(n)`

time? It's useless for this post, but I've been using it for performance evaluation, so I'll leave it here.

```
let rec deep = function
| 0 -> Leaf
| n -> let t = deep (n - 1) in Node (t, n, t)
let long_tree = deep 18
```

We will start with the simple and well-known `iter`

function:

```
let rec iter f = function
| Leaf -> ()
| Node (left, x, right) -> iter f left; f x; iter f right
let to_iter t f = iter f t
```

We wish to get a *generator* from a tree (`to_gen`

), starting from the code of the `to_iter`

function. `iter`

has the control, it decides when to call the function `f`

. A single (sub-)call to the iter function may call `f`

one time, several times (recursively), or not at all. On the contrary, the difficulty in writing a generator is to understand how to advance precisely to the next element of the data-structure, one step at the time, when the user calls the generator.

Our first transformation will be to turn `iter`

into continuation-passing-style. We have already covered this transformation in a previous blog series, as a tool to get tail-recursive implementations of any recursive functions, trading stack space for heap-allocated closures. The idea is to capture the "context" (what will be done to its return value) of each recursive call as an additional parameter of the recursive function.

```
let rec iter_cps f t k = match t with
| Leaf -> k ()
| Node (left, x, right) ->
iter_cps f left (fun () ->
f x; iter_cps f right k)
let to_iter t f = iter_cps f t (fun result -> result)
```

Another way to see this transformation is to consider that we abstract on the "return address" of the function; we may call the continuation parameter `return`

, and the function then reads `| Leaf -> return ()`

, etc.

The next transformation step is to remove the higher-order aspect associated to our representation of continuation as functions. We could in fact stop here, and derive a generator from this iterator in CPS, but using a first-order data-centric representation will give us an algorithm that is closer to the algorithm that people tend to write naturally -- and will also be a bit more efficient, but I won't insist on performance considerations.

The idea of *defunctionalization* is to collect all the different lambda-terms that are used in the function, and give each of them a name (a symbol, a piece of data). Instead of the lambda-term, we will only pass the name as the `k`

parameter, and have an additional `run`

function that, from the name, execute the relevant code (that was in the lambda-term).

In our code examples there are two lambda-terms: `(fun () -> f x; iter_cps f right k)`

and `(fun result -> result)`

. We will call the first `Left`

, as it marks the fact that we have already iterated on the left of the current tree, and the second one `Stop`

as it marks the end of the computation, the last continuation ever passed, to get the final result. But the `Left`

lambda-term does not only takes a parameter, it also captures the variable `x`

, `right`

and `k`

that are in its scope, and need to be saved for when the continuation will be executed. So `Left`

will be a constructor with three parameters, one for each of those variables:

```
type 'a kiter =
| Stop
| Left of 'a kiter * 'a * 'a tree
let rec iter_cps_defun f t k = match t with
| Leaf -> run f k
| Node (left, x, right) ->
iter_cps_defun f left (Left (k, x, right))
and run f = function
| Stop -> ()
| Left (k, x, right) -> f x; iter_cps_defun f right k
```

The new function `run`

takes a data representation of continuation, and runs the corresponding code. I want to emphasize that this transformation again requires no deep thinking, it is only a systematic source-to-source transformation.

You may have noticed that the type of our reified continuations, `'a kiter`

, looks very much like our `'a tree`

datatype. In fact, it is isomorphic to it (if we assume `'a kiter ≃ 'a tree`

, then `Left`

looks very much like `Node`

), so we can rewrite `iter_cps_defun`

with no specific continuation type, using `tree`

to store both data and continuations:

```
let rec iter_cps_defun f t k = match t with
| Leaf -> run f k
| Node (left, x, right) -> iter_cps_defun f left (Node (k, x, right))
and run f = function
| Leaf -> ()
| Node (k, x, right) -> f x; iter_cps_defun f right k
let to_iter t f = iter_cps_defun f t Leaf
```

If you squint your eyes a bit, what you have in front of you is a purely functional realization of a well-known traversal technique called *pointer reversal*. We call `iter`

with a pointer `k`

to "the rest of the work to be done", and when we descend in the `left`

subtree, we replace `k`

with a version of the parent tree `t`

where the left child points not to `left`

anymore, but to `k`

(the parent of `t`

). Of course, pointer reversal are useful for in-place traversals, so this realization is of little practical interest for an persistent immutable tree type where the nodes are copied anyway. But it's still nice to see how, by systematic application of general applications, we reach a place that is already well-known.

Finally, this implementation of the traversal function is low-level enough to be turned into a generator function. The idea is that instead of passing the `k`

argument recursively in an iterator, we can store it in a reference cell between calls to a generator.

```
let to_gen t =
let next = ref t in
let cont = ref Leaf in
let rec iter t k = match t with
| Leaf -> run k
| Node (left, x, right) -> iter left (Node (k, x, right))
and run = function
| Leaf -> raise Exit
| Node (k, x, right) ->
next := right;
cont := k;
x
in fun () -> iter !next !cont
```

This is the same code as before, except that instead of calling `f x`

(that's what you do when you have the control), we store the current state of the traversal, and return `x`

. The CPS transform is precisely what exposed this notion of "state of the traversal" in a way that is convenient to capture through calls.

Let me bore you with some performance details:

```
Tree.iter (1.59 ms) is 13.6% faster than
Tree.iter_cps_defun (1.84 ms) which is 14.7% faster than
Tree.iter_cps (2.16 ms)
```

So there is a cost to pay when inverting control, but in this case it is quite low. Remember than in general your running time will be dominated by what you want to do on each element of a structure, not its traversal itself, so measuring traversals only distort performance results. A 15% difference in a micro-benchmark means that users would most probably not notice the change.

Finally, I will remark that it is possible to revert control by using an off-the-shelf delimited continuation library. This is more general and modular, but may also be noticeably slower. I have tried doing that with Oleg's delightful Delimcc library, that is really not invasive (not requiring any change to the OCaml compiler or runtime); the code is funny (and possibly wrong, I'm not a Delimcc expert), but not competitive performance-wise because the continuation-capture operations are heavier, and not meant to be used at this granularity level.

```
(* this must be compiled with -rectypes *)
let to_gen_delim t =
let p = Delimcc.new_prompt () in
let next = ref begin
Delimcc.push_prompt p (fun () ->
to_seq t (fun x ->
Delimcc.take_subcont p (fun k () -> Some (x, k)));
None)
end in
fun () ->
match !next with
| None -> raise Exit
| Some (x, k) ->
next := Delimcc.push_delim_subcont k (fun () -> ());
x
```

This implements `to_gen`

(that doesn't have control) in terms of `to_seq`

(that requires control). The idea is that the iteration function that we feed to `to_seq`

, when called, captures the current continuation, that is the "rest of the traversal", and returns it along with the element that was passed. Each time our generator is called, it reinstates the continuation to keep computing until the next element, where it gets suspended again.

Note that this definition is not at all specific to trees, it could be defined generically. Furthermore, we can use the exact same technique to implement `of_seq`

in terms of `of_gen`

.

This code snippet must be compiled with `-rectypes`

because the type of the captured continuation `next`

is equi-recursive: it represents the rest of a computation that, when called, returns a reified value representing the rest of a computation that... It would be possible to look at this recursive type in the eye and define an explicit recursive algebraic datatype for it, to get rid of `-rectypes`

, but I was too lazy to do that.

#### Filling a tree

That was the two conversion functions *from* a tree, `to_iter`

and `to_gen`

. What about building a tree from iterator or generators?

First, there is a non-trivial question: how do you build a tree from a sequence? If you had a balanced search tree, you would have an `add`

function and would most probably build the tree by folding over the sequence elements, `add`

-ing one element at a time. But I want something that looks simpler (than maintaining search invariant), but that I actually found harder: fill the tree from left to right.

More precisely, I expect that giving a number of elements that is a power of two would return me a complete tree, with all leaves at the same heights, and that giving less elements would return me a tree with a complete left sub-tree, and a partially-filled right subtree.

Here is the implementation I got after a bit of tinkering. There may be other ways to implement this, and I'm open to suggestions of simpler approaches.

```
(** returns a tree of height [n], along with a boolean [finished]
indicating whether the generator was exhausted during the
construction. If [finished] is false, the tree is a complete tree
of height [n], otherwise it may be partial. *)
let rec fill f n =
match n with
| 0 -> false, Leaf
| _ ->
let finished, left = fill f (n - 1) in
if finished then finished, left
else fill_right f n left
(** taking a complete left subtree as input, it builds a node by
creating a possibly-partial right subtree *)
and fill_right f n left =
match (try Some (f ()) with Exit -> None) with
| None -> true, left
| Some x ->
let finished, right = fill f (n - 1) in
finished, Node (left, x, right)
(** iterate the filling functions above with ever-increasing values of
`n`, until the generator is exhausted. *)
let rec loop f n left =
match fill_right f n left with
| true, t -> t
| false, left -> loop f (n + 1) left
```

It is easy to define `of_gen`

from such a filling function:

```
let of_gen f = loop f 1 Leaf
```

To get a version without control (`of_iter`

), we should again turn these function into CPS form, and then defunctionalize the result. The CPS version is unsurprising; you don't need to read all the code, only look at the continuations following recursive calls.

```
let rec fill_cps f n k = match n with
| 0 -> k false Leaf
| _ ->
fill_cps f (n - 1) (fun finished left ->
if finished then k finished left
else fill_right_cps f n left k)
and fill_right_cps f n left k =
match (try Some (f ()) with Exit -> None) with
| None -> k true left
| Some x ->
fill_cps f (n - 1) (fun finished right ->
k finished (Node (left, x, right)))
let rec loop_cps f n left k =
fill_right_cps f n left (fun finished t ->
if finished then k t
else loop_cps f (n + 1) t k)
let of_gen gen = loop_cps gen 1 Leaf (fun x -> x)
```

Something interesting happens when doing defunctionalization. Currently, `fill`

and `fill_right`

are mutually recursive, and `loop`

is defined separately afterwards. But when doing defunctionalization, I will represent all the lambda-abstractions of the module with a single type, and interpret them through a single function `run`

which will be called whenever a continuation is invoked (you can think of `run`

as an explicit marker for the application of a function, represented as first-order data, to its argument(s)).

This means that `loop`

, where a continuation is applied in the `if finished then ...`

branch, will need to call `run`

, which will also be called from `still`

and `still_right`

. All four functions have to be defined as a single mutually recursive set of definitions. In other words, defunctionalization is *not* a modular transform, it must be defined at once as a "whole-program" transformation, operating on all functions sharing a common calling protocol.

```
type 'a kfill =
| Stop
| Left of 'a kfill * int
| Right of 'a tree * 'a * 'a kfill
| Loop of int * 'a kfill
let rec fill_cps_defun f n k = match n with
| 0 -> run f false Leaf k
| _ -> fill_cps_defun f (n - 1) (Left (k, n))
and fill_right_cps_defun f n left k =
match (try Some (f ()) with Exit -> None) with
| None -> run f true left k
| Some x ->
fill_cps_defun f (n - 1) (Right (left, x, k))
and loop_cps_defun f n left k = fill_right_cps_defun f n left (Loop (n, k))
and run f finished t = function
| Stop -> t
| Left (k, n) ->
if finished then run f finished t k
else fill_right_cps_defun f n t k
| Right (left, x, k) ->
run f finished (Node (left, x, t)) k
| Loop (n, k) ->
if finished then run f finished t k
else loop_cps_defun f (n + 1) t k
let of_gen gen = loop_cps_defun gen 1 Leaf Stop
```

From there, we will be able to derive our control-less conversion `of_iter`

. There is a last sophistication: in the `to_gen`

case, we only needed to call one function to get the next element, namely `iter`

. Here, we must start by calling `loop`

(to begin the construction of the tree), but the computation will need to be suspended after the next element is accessed (`try Some (f ()) ...`

), and restart from there, where `fill`

is called. So our mutable state will store not only the continuation argument to be passed to the next call, but also which function needs to be called next, `loop`

or `fill`

.

```
type 'a call1 =
| Call_loop of int * 'a tree * 'a kfill
| Call_fill of int * 'a kfill
let of_iter iter =
let st = ref (Call_loop (1, Leaf, Stop)) in
let result = ref None in
let rec fill f n k = match n with
| 0 -> run f false Leaf k
| _ -> fill f (n - 1) (Left (k, n))
and fill_right f n left k =
match f with
| None -> run f true left k
| Some x -> st := Call_fill (n - 1, Right (left, x, k))
and loop f n left k = fill_right f n left (Loop (n, k))
and run f finished t = function
| Stop -> result := Some t
| Left (k, n) ->
if finished then run f finished t k
else fill_right f n t k
| Right (left, x, k) ->
run f finished (Node (left, x, t)) k
| Loop (n, k) ->
if finished then run f finished t k
else loop f (n + 1) t k
in
let call x = function
| Call_loop (n, t, k) -> loop x n t k
| Call_fill (n, k) -> fill x n k
in
let add x = call (Some x) !st in
iter add;
call None !st;
match !result with
| None -> assert false
| Some res -> res
```

The amount of code may be a bit intimidating, but it is really only the previous version of the code, transformed to be called with a single element at a time (we still call it `f`

to avoid naming conflict with already-used `x`

), or `None`

when the input ends. By careful transformations we have successfully turned a generator-taking function, that has control, into an iterator-taking function that doesn't have control.

There is a rather important amount of book-keeping involved in this turned-around definition, so it is no surprise that `of_iter`

is substantially slower than `of_gen`

. It seems clear to me, however, that both should be included in a library. Only providing the controlling `of_gen`

means that the library user will have to code without control herself, leading to complicated code on her side (which is what your job, as a library writer, is meant to avoid), and less efficient code.

One way to improve performances for the without-control version would be to introduce buffering: instead of suspending computation after each element is obtained, we could run `fill`

to consume the next `N`

elements before the costly suspension happens.

This blog post is getting rather long so I'll keep this as a possible idea for the next, more advanced one. I'm not sure I will implement buffering, because I already have two more advanced remarks to make on the `fill`

code and its CPS versions. One is how the intermediate CPS version suggests a different representation with multiple return points, and the other is how we can use GADTs to avoid the rather inelegant `None -> assert false`

at the end of `of_iter`

. Stay tuned!

All the code I've written so far is available on gitorious. It uses DelimCC, `-rectypes`

and OCaml 4.00 (for the GADTs I just mentioned), but all these can be commented out, and the main body of the code is rather straightforward OCaml that I expect to compile and run anywhere.

Finally, if you are looking for research literature relevant to the techniques mentioned here, Olivier Danvy is the established master of systematic source-to-source transformations. He famously remarked, for example, how CPS-transforming then defunctionalizing a natural interpreter for a functional language gets you a corresponding *abstract machine* implementation of the language.

On defunctionalization in particular, I really like a 2006 article by François Pottier and Nadji Gauthier Polymorphic typed defunctionalization and concretization. I'll just quote the abstract below:

Defunctionalization is a program transformation that eliminates functions as first-class values. We show that defunctionalization can be viewed as a type-preserving transformation of an extension of with guarded algebraic data types into itself. We also suggest that defunctionalization is an instance of concretization, a more general technique that allows eliminating constructs other than functions. We illustrate this point by presenting two new type-preserving transformations that can be viewed as instances of concretization. One eliminates Rémy-style polymorphic records; the other eliminates the dictionary records introduced by the standard compilation scheme for Haskell's type classes.