Generators, iterators, control and continuations
- March 3, 2013
- Last updated on 2013/07/10
Batteries contributors recently started a new discussion on the Enum module, whose purpose is to provide an good intermediate representation for conversion between data-structures, but whose implementation is perceived as too complicated and not well-understood enough. Going back to fundamentals (forgetting about half the features of Enum), what is the ultimate intermediate traversal interface?
There are two obvious choices in an effectful setting: a
generator, that is a “next-element” function of type
(unit -> 'a)
that generates the next element each time
it is called, or an exception to signal end of input; or an
iterator, a fold-function of type
('a -> unit) -> unit
, that iterates a consumer
function through all elements of the list.
Both are suitable as an abstract representation of any sequence (the
second is precisely the basis for Simon Cruanes’ Sequence library), and
their are at opposite ends of the control spectrum: generators
give control to the consumer of the sequence (she decides when
to call the generator function), while iterators give control to the
producer of the sequence (she decides when to call the iterated
function). It’s easy to transform a structure into a iter
(to_iter
), or to build a structure from a next
(of_next
), because you have the control. But as a library
writer, you should also write the without-control conversions
(of_iter
and to_next
), despite them being
slower and harder to implement, so that your users can live the easy
in-control life!
Finally, continuation capture is a well-known technique to invert control, letting you write code in “direct style”, as if you were in control, when you’re not. In this post, I will demonstrate how you can start from a conversion function that has control and, by systematic source-to-source transformations of conversion to continuation-passing-style (CPS) and defunctionalization, obtain a conversion function that works without the control.
We are interested in datatypes that support the following interface:
type 'a iter = ('a -> unit) -> unit
type 'a gen = unit -> 'a
module type Data = sig
type 'a t
val of_iter : 'a iter -> 'a t
val to_iter : 'a t -> 'a iter
val of_gen : 'a gen -> 'a t
val to_gen : 'a t -> 'a gen
end
Note that there is no support for predicting the size of the data
structure: those will not be good interfaces for fixed-size structures
such as array
. We will use lists as a warm-ups, and then
concentrate on binary trees as a structure with more complicated
traversal and construction patterns.
Lists as a warm-up
module List : Data with type 'a t = 'a list
= struct
type 'a t = 'a list
(* producer, with control *)
let to_iter li = fun f ->
let rec loop = function
| [] -> ()
| x :: xs -> f x; loop xs
in loop li
(* producer, without control *)
let to_gen li =
let st = ref li in
fun () -> match !st with
| [] -> raise Exit
| x::xs -> st := xs; x
(* consumer, with control *)
let of_gen gen =
let st = ref [] in
try while true do st := gen () :: !st done; raise Exit
with Exit -> List.rev !st
(* consumer, without control *)
let of_iter iter =
let st = ref [] in
iter (fun x -> st := x :: !st);
List.rev !st
end
Lists are no surprise. Both consumers, with or without control, are
extremely simple to write. For producers, we would argue that
to_iter
is more convenient to express, and more elegant,
than to_gen
, but the difference is again small.
I did a bit of shallow performance testing, and without surprise the
version with control was always faster than the version without control.
I also tested the compositions of_gen (to_gen li)
and
of_seq (to_seq li)
and the gen
versions, with
the simpler types, seems slightly faster – but it’s hard to get anything
meaningful with code so simple.
A generator for binary trees
The real meat of this post is the work on binary trees. Iterating on binary trees is easy, and will serve a good example of continuation-passing-style (CPS) transform. Building trees from sequences is more subtle, and will make for a more sophisticated application of the same principles.
We use binary trees with data on the nodes, and empty leaves.
type 'a tree = Leaf | Node of 'a tree * 'a * 'a tree
Do you know the trick to create a tree of O(2^n)
nodes
in O(n)
time? It’s useless for this post, but I’ve been
using it for performance evaluation, so I’ll leave it here.
let rec deep = function
| 0 -> Leaf
| n -> let t = deep (n - 1) in Node (t, n, t)
let long_tree = deep 18
We will start with the simple and well-known iter
function:
let rec iter f = function
| Leaf -> ()
| Node (left, x, right) -> iter f left; f x; iter f right
let to_iter t f = iter f t
We wish to get a generator from a tree
(to_gen
), starting from the code of the
to_iter
function. iter
has the control, it
decides when to call the function f
. A single (sub-)call to
the iter function may call f
one time, several times
(recursively), or not at all. On the contrary, the difficulty in writing
a generator is to understand how to advance precisely to the next
element of the data-structure, one step at the time, when the user calls
the generator.
Our first transformation will be to turn iter
into
continuation-passing-style. We have already covered this transformation
in
a previous blog series, as a tool to get tail-recursive
implementations of any recursive functions, trading stack space for
heap-allocated closures. The idea is to capture the “context” (what will
be done to its return value) of each recursive call as an additional
parameter of the recursive function.
let rec iter_cps f t k = match t with
| Leaf -> k ()
| Node (left, x, right) ->
iter_cps f left (fun () ->
f x; iter_cps f right k)
let to_iter t f = iter_cps f t (fun result -> result)
Another way to see this transformation is to consider that we
abstract on the “return address” of the function; we may call the
continuation parameter return
, and the function then reads
| Leaf -> return ()
, etc.
The next transformation step is to remove the higher-order aspect associated to our representation of continuation as functions. We could in fact stop here, and derive a generator from this iterator in CPS, but using a first-order data-centric representation will give us an algorithm that is closer to the algorithm that people tend to write naturally – and will also be a bit more efficient, but I won’t insist on performance considerations.
The idea of defunctionalization is to collect all the
different lambda-terms that are used in the function, and give each of
them a name (a symbol, a piece of data). Instead of the lambda-term, we
will only pass the name as the k
parameter, and have an
additional run
function that, from the name, execute the
relevant code (that was in the lambda-term).
In our code examples there are two lambda-terms:
(fun () -> f x; iter_cps f right k)
and
(fun result -> result)
. We will call the first
Left
, as it marks the fact that we have already iterated on
the left of the current tree, and the second one Stop
as it
marks the end of the computation, the last continuation ever passed, to
get the final result. But the Left
lambda-term does not
only takes a parameter, it also captures the variable x
,
right
and k
that are in its scope, and need to
be saved for when the continuation will be executed. So
Left
will be a constructor with three parameters, one for
each of those variables:
type 'a kiter =
| Stop
| Left of 'a kiter * 'a * 'a tree
let rec iter_cps_defun f t k = match t with
| Leaf -> run f k
| Node (left, x, right) ->
iter_cps_defun f left (Left (k, x, right))
and run f = function
| Stop -> ()
| Left (k, x, right) -> f x; iter_cps_defun f right k
The new function run
takes a data representation of
continuation, and runs the corresponding code. I want to emphasize that
this transformation again requires no deep thinking, it is only a
systematic source-to-source transformation.
You may have noticed that the type of our reified continuations,
'a kiter
, looks very much like our 'a tree
datatype. In fact, it is isomorphic to it (if we assume
'a kiter ≃ 'a tree
, then Left
looks very much
like Node
), so we can rewrite iter_cps_defun
with no specific continuation type, using tree
to store
both data and continuations:
let rec iter_cps_defun f t k = match t with
| Leaf -> run f k
| Node (left, x, right) -> iter_cps_defun f left (Node (k, x, right))
and run f = function
| Leaf -> ()
| Node (k, x, right) -> f x; iter_cps_defun f right k
let to_iter t f = iter_cps_defun f t Leaf
If you squint your eyes a bit, what you have in front of you is a
purely functional realization of a well-known traversal technique called
pointer reversal. We call iter
with a pointer
k
to “the rest of the work to be done”, and when we descend
in the left
subtree, we replace k
with a
version of the parent tree t
where the left child points
not to left
anymore, but to k
(the parent of
t
). Of course, pointer reversal are useful for in-place
traversals, so this realization is of little practical interest for an
persistent immutable tree type where the nodes are copied anyway. But
it’s still nice to see how, by systematic application of general
applications, we reach a place that is already well-known.
Finally, this implementation of the traversal function is low-level
enough to be turned into a generator function. The idea is that instead
of passing the k
argument recursively in an iterator, we
can store it in a reference cell between calls to a generator.
let to_gen t =
let next = ref t in
let cont = ref Leaf in
let rec iter t k = match t with
| Leaf -> run k
| Node (left, x, right) -> iter left (Node (k, x, right))
and run = function
| Leaf -> raise Exit
| Node (k, x, right) ->
next := right;
cont := k;
x
in fun () -> iter !next !cont
This is the same code as before, except that instead of calling
f x
(that’s what you do when you have the control), we
store the current state of the traversal, and return x
. The
CPS transform is precisely what exposed this notion of “state of the
traversal” in a way that is convenient to capture through calls.
Let me bore you with some performance details:
Tree.iter (1.59 ms) is 13.6% faster than
Tree.iter_cps_defun (1.84 ms) which is 14.7% faster than
Tree.iter_cps (2.16 ms)
So there is a cost to pay when inverting control, but in this case it is quite low. Remember than in general your running time will be dominated by what you want to do on each element of a structure, not its traversal itself, so measuring traversals only distort performance results. A 15% difference in a micro-benchmark means that users would most probably not notice the change.
Finally, I will remark that it is possible to revert control by using an off-the-shelf delimited continuation library. This is more general and modular, but may also be noticeably slower. I have tried doing that with Oleg’s delightful Delimcc library, that is really not invasive (not requiring any change to the OCaml compiler or runtime); the code is funny (and possibly wrong, I’m not a Delimcc expert), but not competitive performance-wise because the continuation-capture operations are heavier, and not meant to be used at this granularity level.
(* this must be compiled with -rectypes *)
let to_gen_delim t =
let p = Delimcc.new_prompt () in
let next = ref begin
Delimcc.push_prompt p (fun () ->
to_seq t (fun x ->
Delimcc.take_subcont p (fun k () -> Some (x, k)));
None)
end in
fun () ->
match !next with
| None -> raise Exit
| Some (x, k) ->
next := Delimcc.push_delim_subcont k (fun () -> ());
x
This implements to_gen
(that doesn’t have control) in
terms of to_seq
(that requires control). The idea is that
the iteration function that we feed to to_seq
, when called,
captures the current continuation, that is the “rest of the traversal”,
and returns it along with the element that was passed. Each time our
generator is called, it reinstates the continuation to keep computing
until the next element, where it gets suspended again.
Note that this definition is not at all specific to trees, it could
be defined generically. Furthermore, we can use the exact same technique
to implement of_seq
in terms of of_gen
.
This code snippet must be compiled with -rectypes
because the type of the captured continuation next
is
equi-recursive: it represents the rest of a computation that, when
called, returns a reified value representing the rest of a computation
that… It would be possible to look at this recursive type in the eye and
define an explicit recursive algebraic datatype for it, to get rid of
-rectypes
, but I was too lazy to do that.
Filling a tree
That was the two conversion functions from a tree,
to_iter
and to_gen
. What about building a tree
from iterator or generators?
First, there is a non-trivial question: how do you build a tree from
a sequence? If you had a balanced search tree, you would have an
add
function and would most probably build the tree by
folding over the sequence elements, add
-ing one element at
a time. But I want something that looks simpler (than maintaining search
invariant), but that I actually found harder: fill the tree from left to
right.
More precisely, I expect that giving a number of elements that is a power of two would return me a complete tree, with all leaves at the same heights, and that giving less elements would return me a tree with a complete left sub-tree, and a partially-filled right subtree.
Here is the implementation I got after a bit of tinkering. There may be other ways to implement this, and I’m open to suggestions of simpler approaches.
(** returns a tree of height [n], along with a boolean [finished]
indicating whether the generator was exhausted during the
construction. If [finished] is false, the tree is a complete tree
of height [n], otherwise it may be partial. *)
let rec fill f n =
match n with
| 0 -> false, Leaf
| _ ->
let finished, left = fill f (n - 1) in
if finished then finished, left
else fill_right f n left
(** taking a complete left subtree as input, it builds a node by
creating a possibly-partial right subtree *)
and fill_right f n left =
match (try Some (f ()) with Exit -> None) with
| None -> true, left
| Some x ->
let finished, right = fill f (n - 1) in
finished, Node (left, x, right)
(** iterate the filling functions above with ever-increasing values of
`n`, until the generator is exhausted. *)
let rec loop f n left =
match fill_right f n left with
| true, t -> t
| false, left -> loop f (n + 1) left
It is easy to define of_gen
from such a filling
function:
let of_gen f = loop f 1 Leaf
To get a version without control (of_iter
), we should
again turn these function into CPS form, and then defunctionalize the
result. The CPS version is unsurprising; you don’t need to read all the
code, only look at the continuations following recursive calls.
let rec fill_cps f n k = match n with
| 0 -> k false Leaf
| _ ->
fill_cps f (n - 1) (fun finished left ->
if finished then k finished left
else fill_right_cps f n left k)
and fill_right_cps f n left k =
match (try Some (f ()) with Exit -> None) with
| None -> k true left
| Some x ->
fill_cps f (n - 1) (fun finished right ->
k finished (Node (left, x, right)))
let rec loop_cps f n left k =
fill_right_cps f n left (fun finished t ->
if finished then k t
else loop_cps f (n + 1) t k)
let of_gen gen = loop_cps gen 1 Leaf (fun x -> x)
Something interesting happens when doing defunctionalization.
Currently, fill
and fill_right
are mutually
recursive, and loop
is defined separately afterwards. But
when doing defunctionalization, I will represent all the
lambda-abstractions of the module with a single type, and interpret them
through a single function run
which will be called whenever
a continuation is invoked (you can think of run
as an
explicit marker for the application of a function, represented as
first-order data, to its argument(s)).
This means that loop
, where a continuation is applied in
the if finished then ...
branch, will need to call
run
, which will also be called from still
and
still_right
. All four functions have to be defined as a
single mutually recursive set of definitions. In other words,
defunctionalization is not a modular transform, it must be
defined at once as a “whole-program” transformation, operating on all
functions sharing a common calling protocol.
type 'a kfill =
| Stop
| Left of 'a kfill * int
| Right of 'a tree * 'a * 'a kfill
| Loop of int * 'a kfill
let rec fill_cps_defun f n k = match n with
| 0 -> run f false Leaf k
| _ -> fill_cps_defun f (n - 1) (Left (k, n))
and fill_right_cps_defun f n left k =
match (try Some (f ()) with Exit -> None) with
| None -> run f true left k
| Some x ->
fill_cps_defun f (n - 1) (Right (left, x, k))
and loop_cps_defun f n left k = fill_right_cps_defun f n left (Loop (n, k))
and run f finished t = function
| Stop -> t
| Left (k, n) ->
if finished then run f finished t k
else fill_right_cps_defun f n t k
| Right (left, x, k) ->
run f finished (Node (left, x, t)) k
| Loop (n, k) ->
if finished then run f finished t k
else loop_cps_defun f (n + 1) t k
let of_gen gen = loop_cps_defun gen 1 Leaf Stop
From there, we will be able to derive our control-less conversion
of_iter
. There is a last sophistication: in the
to_gen
case, we only needed to call one function to get the
next element, namely iter
. Here, we must start by calling
loop
(to begin the construction of the tree), but the
computation will need to be suspended after the next element is accessed
(try Some (f ()) ...
), and restart from there, where
fill
is called. So our mutable state will store not only
the continuation argument to be passed to the next call, but also which
function needs to be called next, loop
or
fill
.
type 'a call1 =
| Call_loop of int * 'a tree * 'a kfill
| Call_fill of int * 'a kfill
let of_iter iter =
let st = ref (Call_loop (1, Leaf, Stop)) in
let result = ref None in
let rec fill f n k = match n with
| 0 -> run f false Leaf k
| _ -> fill f (n - 1) (Left (k, n))
and fill_right f n left k =
match f with
| None -> run f true left k
| Some x -> st := Call_fill (n - 1, Right (left, x, k))
and loop f n left k = fill_right f n left (Loop (n, k))
and run f finished t = function
| Stop -> result := Some t
| Left (k, n) ->
if finished then run f finished t k
else fill_right f n t k
| Right (left, x, k) ->
run f finished (Node (left, x, t)) k
| Loop (n, k) ->
if finished then run f finished t k
else loop f (n + 1) t k
in
let call x = function
| Call_loop (n, t, k) -> loop x n t k
| Call_fill (n, k) -> fill x n k
in
let add x = call (Some x) !st in
iter add;
call None !st;
match !result with
| None -> assert false
| Some res -> res
The amount of code may be a bit intimidating, but it is really only
the previous version of the code, transformed to be called with a single
element at a time (we still call it f
to avoid naming
conflict with already-used x
), or None
when
the input ends. By careful transformations we have successfully turned a
generator-taking function, that has control, into an iterator-taking
function that doesn’t have control.
There is a rather important amount of book-keeping involved in this
turned-around definition, so it is no surprise that of_iter
is substantially slower than of_gen
. It seems clear to me,
however, that both should be included in a library. Only providing the
controlling of_gen
means that the library user will have to
code without control herself, leading to complicated code on her side
(which is what your job, as a library writer, is meant to avoid), and
less efficient code.
One way to improve performances for the without-control version would
be to introduce buffering: instead of suspending computation after each
element is obtained, we could run fill
to consume the next
N
elements before the costly suspension happens.
This blog post is getting rather long so I’ll keep this as a possible
idea for the next, more advanced one. I’m not sure I will implement
buffering, because I already have two more advanced remarks to make on
the fill
code and its CPS versions. One is how the
intermediate CPS version suggests a different representation with
multiple return points, and the other is how we can use GADTs to avoid
the rather inelegant None -> assert false
at the end of
of_iter
. Stay tuned!
All the code I’ve written so far is available on
gitorious. It uses DelimCC, -rectypes
and OCaml 4.00
(for the GADTs I just mentioned), but all these can be commented out,
and the main body of the code is rather straightforward OCaml that I
expect to compile and run anywhere.
Finally, if you are looking for research literature relevant to the techniques mentioned here, Olivier Danvy is the established master of systematic source-to-source transformations. He famously remarked, for example, how CPS-transforming then defunctionalizing a natural interpreter for a functional language gets you a corresponding abstract machine implementation of the language.
On defunctionalization in particular, I really like a 2006 article by François Pottier and Nadji Gauthier Polymorphic typed defunctionalization and concretization. I’ll just quote the abstract below:
Defunctionalization is a program transformation that eliminates functions as first-class values. We show that defunctionalization can be viewed as a type-preserving transformation of an extension of with guarded algebraic data types into itself. We also suggest that defunctionalization is an instance of concretization, a more general technique that allows eliminating constructs other than functions. We illustrate this point by presenting two new type-preserving transformations that can be viewed as instances of concretization. One eliminates Rémy-style polymorphic records; the other eliminates the dictionary records introduced by the standard compilation scheme for Haskell’s type classes.