Batteries contributors recently started a new discussion on the Enum module, whose purpose is to provide an good intermediate representation for conversion between data-structures, but whose implementation is perceived as too complicated and not well-understood enough. Going back to fundamentals (forgetting about half the features of Enum), what is the ultimate intermediate traversal interface?
There are two obvious choices in an effectful setting: a
generator, that is a “next-element” function of type
(unit -> 'a) that generates the next element each time
it is called, or an exception to signal end of input; or an
iterator, a fold-function of type
('a -> unit) -> unit, that iterates a consumer
function through all elements of the list.
Both are suitable as an abstract representation of any sequence (the
second is precisely the basis for Simon Cruanes’ Sequence library), and
their are at opposite ends of the control spectrum: generators
give control to the consumer of the sequence (she decides when
to call the generator function), while iterators give control to the
producer of the sequence (she decides when to call the iterated
function). It’s easy to transform a structure into a iter
to_iter), or to build a structure from a next
of_next), because you have the control. But as a library
writer, you should also write the without-control conversions
to_next), despite them being
slower and harder to implement, so that your users can live the easy
Finally, continuation capture is a well-known technique to invert control, letting you write code in “direct style”, as if you were in control, when you’re not. In this post, I will demonstrate how you can start from a conversion function that has control and, by systematic source-to-source transformations of conversion to continuation-passing-style (CPS) and defunctionalization, obtain a conversion function that works without the control.
We are interested in datatypes that support the following interface:
type 'a iter = ('a -> unit) -> unit type 'a gen = unit -> 'a module type Data = sig type 'a t val of_iter : 'a iter -> 'a t val to_iter : 'a t -> 'a iter val of_gen : 'a gen -> 'a t val to_gen : 'a t -> 'a gen end
Note that there is no support for predicting the size of the data
structure: those will not be good interfaces for fixed-size structures
array. We will use lists as a warm-ups, and then
concentrate on binary trees as a structure with more complicated
traversal and construction patterns.
Lists as a warm-up
module List : Data with type 'a t = 'a list = struct type 'a t = 'a list (* producer, with control *) let to_iter li = fun f -> let rec loop = function |  -> () | x :: xs -> f x; loop xs in loop li (* producer, without control *) let to_gen li = let st = ref li in fun () -> match !st with |  -> raise Exit | x::xs -> st := xs; x (* consumer, with control *) let of_gen gen = let st = ref  in try while true do st := gen () :: !st done; raise Exit with Exit -> List.rev !st (* consumer, without control *) let of_iter iter = let st = ref  in iter (fun x -> st := x :: !st); List.rev !st end
Lists are no surprise. Both consumers, with or without control, are
extremely simple to write. For producers, we would argue that
to_iter is more convenient to express, and more elegant,
to_gen, but the difference is again small.
I did a bit of shallow performance testing, and without surprise the
version with control was always faster than the version without control.
I also tested the compositions
of_gen (to_gen li) and
of_seq (to_seq li) and the
gen versions, with
the simpler types, seems slightly faster – but it’s hard to get anything
meaningful with code so simple.
A generator for binary trees
The real meat of this post is the work on binary trees. Iterating on binary trees is easy, and will serve a good example of continuation-passing-style (CPS) transform. Building trees from sequences is more subtle, and will make for a more sophisticated application of the same principles.
We use binary trees with data on the nodes, and empty leaves.
type 'a tree = Leaf | Node of 'a tree * 'a * 'a tree
Do you know the trick to create a tree of
O(n) time? It’s useless for this post, but I’ve been
using it for performance evaluation, so I’ll leave it here.
let rec deep = function | 0 -> Leaf | n -> let t = deep (n - 1) in Node (t, n, t) let long_tree = deep 18
We will start with the simple and well-known
let rec iter f = function | Leaf -> () | Node (left, x, right) -> iter f left; f x; iter f right let to_iter t f = iter f t
We wish to get a generator from a tree
to_gen), starting from the code of the
iter has the control, it
decides when to call the function
f. A single (sub-)call to
the iter function may call
f one time, several times
(recursively), or not at all. On the contrary, the difficulty in writing
a generator is to understand how to advance precisely to the next
element of the data-structure, one step at the time, when the user calls
Our first transformation will be to turn
continuation-passing-style. We have already covered this transformation
a previous blog series, as a tool to get tail-recursive
implementations of any recursive functions, trading stack space for
heap-allocated closures. The idea is to capture the “context” (what will
be done to its return value) of each recursive call as an additional
parameter of the recursive function.
let rec iter_cps f t k = match t with | Leaf -> k () | Node (left, x, right) -> iter_cps f left (fun () -> f x; iter_cps f right k) let to_iter t f = iter_cps f t (fun result -> result)
Another way to see this transformation is to consider that we
abstract on the “return address” of the function; we may call the
return, and the function then reads
| Leaf -> return (), etc.
The next transformation step is to remove the higher-order aspect associated to our representation of continuation as functions. We could in fact stop here, and derive a generator from this iterator in CPS, but using a first-order data-centric representation will give us an algorithm that is closer to the algorithm that people tend to write naturally – and will also be a bit more efficient, but I won’t insist on performance considerations.
The idea of defunctionalization is to collect all the
different lambda-terms that are used in the function, and give each of
them a name (a symbol, a piece of data). Instead of the lambda-term, we
will only pass the name as the
k parameter, and have an
run function that, from the name, execute the
relevant code (that was in the lambda-term).
In our code examples there are two lambda-terms:
(fun () -> f x; iter_cps f right k) and
(fun result -> result). We will call the first
Left, as it marks the fact that we have already iterated on
the left of the current tree, and the second one
Stop as it
marks the end of the computation, the last continuation ever passed, to
get the final result. But the
Left lambda-term does not
only takes a parameter, it also captures the variable
k that are in its scope, and need to
be saved for when the continuation will be executed. So
Left will be a constructor with three parameters, one for
each of those variables:
type 'a kiter = | Stop | Left of 'a kiter * 'a * 'a tree let rec iter_cps_defun f t k = match t with | Leaf -> run f k | Node (left, x, right) -> iter_cps_defun f left (Left (k, x, right)) and run f = function | Stop -> () | Left (k, x, right) -> f x; iter_cps_defun f right k
The new function
run takes a data representation of
continuation, and runs the corresponding code. I want to emphasize that
this transformation again requires no deep thinking, it is only a
systematic source-to-source transformation.
You may have noticed that the type of our reified continuations,
'a kiter, looks very much like our
datatype. In fact, it is isomorphic to it (if we assume
'a kiter ≃ 'a tree, then
Left looks very much
Node), so we can rewrite
with no specific continuation type, using
tree to store
both data and continuations:
let rec iter_cps_defun f t k = match t with | Leaf -> run f k | Node (left, x, right) -> iter_cps_defun f left (Node (k, x, right)) and run f = function | Leaf -> () | Node (k, x, right) -> f x; iter_cps_defun f right k let to_iter t f = iter_cps_defun f t Leaf
If you squint your eyes a bit, what you have in front of you is a
purely functional realization of a well-known traversal technique called
pointer reversal. We call
iter with a pointer
k to “the rest of the work to be done”, and when we descend
left subtree, we replace
k with a
version of the parent tree
t where the left child points
left anymore, but to
k (the parent of
t). Of course, pointer reversal are useful for in-place
traversals, so this realization is of little practical interest for an
persistent immutable tree type where the nodes are copied anyway. But
it’s still nice to see how, by systematic application of general
applications, we reach a place that is already well-known.
Finally, this implementation of the traversal function is low-level
enough to be turned into a generator function. The idea is that instead
of passing the
k argument recursively in an iterator, we
can store it in a reference cell between calls to a generator.
let to_gen t = let next = ref t in let cont = ref Leaf in let rec iter t k = match t with | Leaf -> run k | Node (left, x, right) -> iter left (Node (k, x, right)) and run = function | Leaf -> raise Exit | Node (k, x, right) -> next := right; cont := k; x in fun () -> iter !next !cont
This is the same code as before, except that instead of calling
f x (that’s what you do when you have the control), we
store the current state of the traversal, and return
CPS transform is precisely what exposed this notion of “state of the
traversal” in a way that is convenient to capture through calls.
Let me bore you with some performance details:
Tree.iter (1.59 ms) is 13.6% faster than Tree.iter_cps_defun (1.84 ms) which is 14.7% faster than Tree.iter_cps (2.16 ms)
So there is a cost to pay when inverting control, but in this case it is quite low. Remember than in general your running time will be dominated by what you want to do on each element of a structure, not its traversal itself, so measuring traversals only distort performance results. A 15% difference in a micro-benchmark means that users would most probably not notice the change.
Finally, I will remark that it is possible to revert control by using an off-the-shelf delimited continuation library. This is more general and modular, but may also be noticeably slower. I have tried doing that with Oleg’s delightful Delimcc library, that is really not invasive (not requiring any change to the OCaml compiler or runtime); the code is funny (and possibly wrong, I’m not a Delimcc expert), but not competitive performance-wise because the continuation-capture operations are heavier, and not meant to be used at this granularity level.
(* this must be compiled with -rectypes *) let to_gen_delim t = let p = Delimcc.new_prompt () in let next = ref begin Delimcc.push_prompt p (fun () -> to_seq t (fun x -> Delimcc.take_subcont p (fun k () -> Some (x, k))); None) end in fun () -> match !next with | None -> raise Exit | Some (x, k) -> next := Delimcc.push_delim_subcont k (fun () -> ()); x
to_gen (that doesn’t have control) in
to_seq (that requires control). The idea is that
the iteration function that we feed to
to_seq, when called,
captures the current continuation, that is the “rest of the traversal”,
and returns it along with the element that was passed. Each time our
generator is called, it reinstates the continuation to keep computing
until the next element, where it gets suspended again.
Note that this definition is not at all specific to trees, it could
be defined generically. Furthermore, we can use the exact same technique
of_seq in terms of
This code snippet must be compiled with
because the type of the captured continuation
equi-recursive: it represents the rest of a computation that, when
called, returns a reified value representing the rest of a computation
that… It would be possible to look at this recursive type in the eye and
define an explicit recursive algebraic datatype for it, to get rid of
-rectypes, but I was too lazy to do that.
Filling a tree
That was the two conversion functions from a tree,
to_gen. What about building a tree
from iterator or generators?
First, there is a non-trivial question: how do you build a tree from
a sequence? If you had a balanced search tree, you would have an
add function and would most probably build the tree by
folding over the sequence elements,
add-ing one element at
a time. But I want something that looks simpler (than maintaining search
invariant), but that I actually found harder: fill the tree from left to
More precisely, I expect that giving a number of elements that is a power of two would return me a complete tree, with all leaves at the same heights, and that giving less elements would return me a tree with a complete left sub-tree, and a partially-filled right subtree.
Here is the implementation I got after a bit of tinkering. There may be other ways to implement this, and I’m open to suggestions of simpler approaches.
(** returns a tree of height [n], along with a boolean [finished] indicating whether the generator was exhausted during the construction. If [finished] is false, the tree is a complete tree of height [n], otherwise it may be partial. *) let rec fill f n = match n with | 0 -> false, Leaf | _ -> let finished, left = fill f (n - 1) in if finished then finished, left else fill_right f n left (** taking a complete left subtree as input, it builds a node by creating a possibly-partial right subtree *) and fill_right f n left = match (try Some (f ()) with Exit -> None) with | None -> true, left | Some x -> let finished, right = fill f (n - 1) in finished, Node (left, x, right) (** iterate the filling functions above with ever-increasing values of `n`, until the generator is exhausted. *) let rec loop f n left = match fill_right f n left with | true, t -> t | false, left -> loop f (n + 1) left
It is easy to define
of_gen from such a filling
let of_gen f = loop f 1 Leaf
To get a version without control (
of_iter), we should
again turn these function into CPS form, and then defunctionalize the
result. The CPS version is unsurprising; you don’t need to read all the
code, only look at the continuations following recursive calls.
let rec fill_cps f n k = match n with | 0 -> k false Leaf | _ -> fill_cps f (n - 1) (fun finished left -> if finished then k finished left else fill_right_cps f n left k) and fill_right_cps f n left k = match (try Some (f ()) with Exit -> None) with | None -> k true left | Some x -> fill_cps f (n - 1) (fun finished right -> k finished (Node (left, x, right))) let rec loop_cps f n left k = fill_right_cps f n left (fun finished t -> if finished then k t else loop_cps f (n + 1) t k) let of_gen gen = loop_cps gen 1 Leaf (fun x -> x)
Something interesting happens when doing defunctionalization.
fill_right are mutually
loop is defined separately afterwards. But
when doing defunctionalization, I will represent all the
lambda-abstractions of the module with a single type, and interpret them
through a single function
run which will be called whenever
a continuation is invoked (you can think of
run as an
explicit marker for the application of a function, represented as
first-order data, to its argument(s)).
This means that
loop, where a continuation is applied in
if finished then ... branch, will need to call
run, which will also be called from
still_right. All four functions have to be defined as a
single mutually recursive set of definitions. In other words,
defunctionalization is not a modular transform, it must be
defined at once as a “whole-program” transformation, operating on all
functions sharing a common calling protocol.
type 'a kfill = | Stop | Left of 'a kfill * int | Right of 'a tree * 'a * 'a kfill | Loop of int * 'a kfill let rec fill_cps_defun f n k = match n with | 0 -> run f false Leaf k | _ -> fill_cps_defun f (n - 1) (Left (k, n)) and fill_right_cps_defun f n left k = match (try Some (f ()) with Exit -> None) with | None -> run f true left k | Some x -> fill_cps_defun f (n - 1) (Right (left, x, k)) and loop_cps_defun f n left k = fill_right_cps_defun f n left (Loop (n, k)) and run f finished t = function | Stop -> t | Left (k, n) -> if finished then run f finished t k else fill_right_cps_defun f n t k | Right (left, x, k) -> run f finished (Node (left, x, t)) k | Loop (n, k) -> if finished then run f finished t k else loop_cps_defun f (n + 1) t k let of_gen gen = loop_cps_defun gen 1 Leaf Stop
From there, we will be able to derive our control-less conversion
of_iter. There is a last sophistication: in the
to_gen case, we only needed to call one function to get the
next element, namely
iter. Here, we must start by calling
loop (to begin the construction of the tree), but the
computation will need to be suspended after the next element is accessed
try Some (f ()) ...), and restart from there, where
fill is called. So our mutable state will store not only
the continuation argument to be passed to the next call, but also which
function needs to be called next,
type 'a call1 = | Call_loop of int * 'a tree * 'a kfill | Call_fill of int * 'a kfill let of_iter iter = let st = ref (Call_loop (1, Leaf, Stop)) in let result = ref None in let rec fill f n k = match n with | 0 -> run f false Leaf k | _ -> fill f (n - 1) (Left (k, n)) and fill_right f n left k = match f with | None -> run f true left k | Some x -> st := Call_fill (n - 1, Right (left, x, k)) and loop f n left k = fill_right f n left (Loop (n, k)) and run f finished t = function | Stop -> result := Some t | Left (k, n) -> if finished then run f finished t k else fill_right f n t k | Right (left, x, k) -> run f finished (Node (left, x, t)) k | Loop (n, k) -> if finished then run f finished t k else loop f (n + 1) t k in let call x = function | Call_loop (n, t, k) -> loop x n t k | Call_fill (n, k) -> fill x n k in let add x = call (Some x) !st in iter add; call None !st; match !result with | None -> assert false | Some res -> res
The amount of code may be a bit intimidating, but it is really only
the previous version of the code, transformed to be called with a single
element at a time (we still call it
f to avoid naming
conflict with already-used
the input ends. By careful transformations we have successfully turned a
generator-taking function, that has control, into an iterator-taking
function that doesn’t have control.
There is a rather important amount of book-keeping involved in this
turned-around definition, so it is no surprise that
is substantially slower than
of_gen. It seems clear to me,
however, that both should be included in a library. Only providing the
of_gen means that the library user will have to
code without control herself, leading to complicated code on her side
(which is what your job, as a library writer, is meant to avoid), and
less efficient code.
One way to improve performances for the without-control version would
be to introduce buffering: instead of suspending computation after each
element is obtained, we could run
fill to consume the next
N elements before the costly suspension happens.
This blog post is getting rather long so I’ll keep this as a possible
idea for the next, more advanced one. I’m not sure I will implement
buffering, because I already have two more advanced remarks to make on
fill code and its CPS versions. One is how the
intermediate CPS version suggests a different representation with
multiple return points, and the other is how we can use GADTs to avoid
the rather inelegant
None -> assert false at the end of
of_iter. Stay tuned!
All the code I’ve written so far is available on
gitorious. It uses DelimCC,
-rectypes and OCaml 4.00
(for the GADTs I just mentioned), but all these can be commented out,
and the main body of the code is rather straightforward OCaml that I
expect to compile and run anywhere.
Finally, if you are looking for research literature relevant to the techniques mentioned here, Olivier Danvy is the established master of systematic source-to-source transformations. He famously remarked, for example, how CPS-transforming then defunctionalizing a natural interpreter for a functional language gets you a corresponding abstract machine implementation of the language.
On defunctionalization in particular, I really like a 2006 article by François Pottier and Nadji Gauthier Polymorphic typed defunctionalization and concretization. I’ll just quote the abstract below:
Defunctionalization is a program transformation that eliminates functions as first-class values. We show that defunctionalization can be viewed as a type-preserving transformation of an extension of with guarded algebraic data types into itself. We also suggest that defunctionalization is an instance of concretization, a more general technique that allows eliminating constructs other than functions. We illustrate this point by presenting two new type-preserving transformations that can be viewed as instances of concretization. One eliminates Rémy-style polymorphic records; the other eliminates the dictionary records introduced by the standard compilation scheme for Haskell’s type classes.