Fixing stack overflows using continuations
- December 1, 2012
Just a few weeks ago, Andrej Bauer wrote two articles ( 1 2 ) about implementing a dependent type theory in Ocaml. In particular, he demonstrates a cool idea to normalize expressions, called « normalization by evaluation ».
However, when trying to evaluate 220, the program overflows the stack. It turns out that there is an easy and methodic way to locate and fix such problems, which we will discuss in that post.
Edit: there is now a third article published in the series, but the code presented here is still based on the Part II branch of Andrej’s git repository.
First, let’s examine a similar, working example. From Andrej’s test, we can compute 216:
$ make native && echo "Eval power two (power two four)." | ./tt.native -l church.tt
ocamlbuild -lib unix -use-menhir tt.native
Finished, 38 targets (38 cached) in 00:00:00.
tt master
[Type Ctrl-D to exit or "Help." for help.]
# = fun A : Type 0 =>
fun x : A -> A =>
fun x0 : A =>
x
(x
(x
(x
(x
(x
(x
(x
(x
(x
(x
(x
(x
(x
(x
(x
(x
(x
(x
(x
(x
(x
(x
(x
(x
(x
(x (x (x (x (x (x (x (x (x (... (...))))))))))))))))))))))))))))))))))))
: forall A : Type 0, (A -> A) -> A -> A
This is just the regular Church-encoding of a huge number: you can
think of A as the natural numbers type, of x
as the
successor constructor, and x0
as zero. Notice how the
result is a very deeply-nested sequence of applications. The shape being
pretty-printed here relates closely to the manipulated datatype, and as
we will see, it is the source of the overflow.
Let us now modify the Makefile to add debug information to get a proper backtrace:
native:
- ocamlbuild -lib unix -use-menhir tt.native
+ ocamlbuild -tag annot -tag debug -lib unix -use-menhir tt.native
Rebuilding and running it on 220 (setting
OCAMLRUNPARAM=“b”
to print the backtrace when the stack
overflows):
$ make native && echo "Eval power two (times four five)." | OCAMLRUNPARAM="b" ./tt.native -l church.tt
ocamlbuild -tag annot -tag debug -lib unix -use-menhir tt.native
Finished, 38 targets (15 cached) in 00:00:00.
tt master
[Type Ctrl-D to exit or "Help." for help.]
# Fatal error: exception Stack_overflow
Raised by primitive operation at file "value.ml", line 74, characters 33-43
Called from file "value.ml", line 74, characters 33-43
Called from file "value.ml", line 90, characters 47-54
Called from file "value.ml", line 74, characters 33-43
Called from file "value.ml", line 90, characters 47-54
Called from file "value.ml", line 74, characters 33-43
Called from file "value.ml", line 90, characters 47-54
Called from file "value.ml", line 74, characters 33-43
[...]
The problem here is that the code tries to reify the result whose body contains the deep nesting of application nodes. When recursively processing a term, the call stack is proportional to the “height” of the term as a tree; our result here is a tree that is deeply skewed on the right, with height around 220. Location information tells us which recursive processing step fails (two mutually recursive calls), and the guilty function here is the following:
and reify_abstraction (x, t, f) =
let x = Syntax.refresh x in
(x, reify t, reify (f (Neutral (Var x))))
It appears that the call to f (Neutral (Var x))
blows
the stack. To solve the problem (or at least this part of the problem),
we must somehow put this call in tail-call position; it is currently
surrounded by some post-treatment that builds the
(x, reify t, ...)
triple from the result of the call.
The idea of using continuation-passing-style (CPS) is to bluntly put
the call in tail-call position, and pass it the “post-treatment” to be
done as a function (continuation) called by f
itself when
it produces its result. “Don’t call us, we’ll call you”: this is an
inversion of control.
and reify_abstraction (x, t, f) =
let x = Syntax.refresh x in
fun body -> (x, reify t, reify body)) f (Neutral (Var x)) (
For typing reason (the reification is polymorphic), this simple style doesn’t quite work here, we must explicitly represent the continuation value with a first-class polymorphic type (as a record field);
and reify_abstraction (x, t, f) =
let x = Syntax.refresh x in
(f (Neutral (Var x))).callfun body -> (x, reify t, reify body)) (
The type abstraction
accepted by the function needs to
be changed from:
and abstraction = variable * value * (value -> value)
into:
and abstraction =
variable * value * (value -> value continuation)
and 'a continuation = { call : 'b . ('a -> 'b) -> 'b }
We have to fix some parts of the program to adapt to that change.
When we before used an abstraction body directly as a function,
f x
we now need to pass it a continuation. In
reify_abstraction
we were carefull to pass just the right
continuation to have a tail-call, but the other uses are not critical
for our input, so we can just pass them the identity continuation, that
immediately returns the result.
For example in equal_abstraction
,
and equal_abstraction (x, v1, f1) (_, v2, f2) =
equal v1 v2 &&let x = Neutral (Var (Syntax.refresh x)) in
( equal (f1 x) (f2 x))
The last line becomes:
equal ((f1 x).call id) ((f2 x).call id))
The three other changes can be seen in the corresponding commit.
Let’s put this to test:
$ make native && echo "Eval power two (times four five)." | OCAMLRUNPARAM="b" ./tt.native -l church.tt
ocamlbuild -tag annot -tag debug -lib unix -use-menhir tt.native
Finished, 38 targets (30 cached) in 00:00:00.
tt master
[Type Ctrl-D to exit or "Help." for help.]
# Fatal error: exception Stack_overflow
Raised at file "list.ml", line 157, characters 19-30
Called from file "value.ml", line 78, characters 33-43
Called from file "value.ml", line 95, characters 47-54
Called from file "value.ml", line 78, characters 33-43
Called from file "value.ml", line 95, characters 47-54
Called from file "value.ml", line 78, characters 33-43
Called from file "value.ml", line 95, characters 47-54
Called from file "value.ml", line 78, characters 33-43
[...]
Not there yet: we have made one particular part of the program tail-recursive, but we still need to fix the rest. The next problem arises in the treatment of the right-hand side (RHS) of application:
and reify_neutral' = function
| Var x -> Syntax.Var x | App (n, v) -> Syntax.App (reify_neutral n, reify v)
As the problem on our particular input comes from the right arguments
of application nodes, the call we would like to make tail-recursive here
is reify v
.
This part of the program is less complicated from a typing point of
view, so we don’t need the explicit record wrapping for polymorphism.
However, the fix is larger as we we are in the middle of several
mutually recursive functions (reify
, reify'
,
reify_neutral'
, reify_neutral
) that all need
to be made CPS somehow.
Fortunately, this can be done step-by-step. First turn this
reify v
call in a tail-call to a yet-to-be-defined
reify_cont
function:
| App (n, v) ->fun v -> Syntax.App (reify_neutral n, v)) reify_cont v (
The simplest way to define reify_cont v ret
(where
ret
is the name of the continuation, of which we can think
of as a “return”) is the following:
and reify_cont v ret = ret (reify v)
Of course, this definition doesn’t solve anything, as it’s no more tail-recursive than the initial version. But we can iteratively refine this definition to expand the continuation-passing-style.
First, inline the definition of reify
:
and reify_cont v ret =
ret (Syntax.nowhere (reify' v))
Then turn this into a tail-call to a new reify'_cont
function, naively defined. Note how the continuation passed to
reify'_cont
first applies the post-treatment of the
reify
function, then call the passed-in continuation
ret
.
and reify_cont v ret =
fun v -> ret (Syntax.nowhere v))
reify'_cont v (
and reify'_cont v ret = ret (reify' v)
It should now be routine: inline the call, then specialize the definition to expose a tail-call.
and reify'_cont v ret = match v with
| Universe _ | Pi _ | Lambda _ -> ret (reify' v) | Neutral n -> reify_neutral'_cont n ret
Note that we have selectively handled the Neutral
case,
which is where the stack overflow happens and needs to be made tail-rec,
while all the other cases reuse the non-CPS reify'
function: adding CPS versions of our code duplicate some logic, but only
in the code paths that need to be stack-preserving.
The definition of reify_neutral'_cont
again follows the
same principle, but chains together the CPS versions of
reify
and reify_neutral
:
and reify_neutral'_cont n ret = match n with
| Var x -> ret (Syntax.Var x)
| App (n, v) ->fun v ->
reify_cont v (fun n ->
reify_neutral_cont n (
ret (Syntax.App (n, v))))
and reify_neutral_cont n ret =
fun n -> ret (Syntax.nowhere n)) reify_neutral'_cont n (
This closes the loop: we have made a _cont
version of
all the functions involved in the mutual recursion that overflows the
stack. And they were derived from their non-CPS definition in an simple
and methodic way.
Of course, this is still not enough:
$ make native && echo "Eval power two (times four five)." | OCAMLRUNPARAM="b" ./tt.native -l church.tt
ocamlbuild -tag annot -tag debug -lib unix -use-menhir tt.native
Finished, 38 targets (30 cached) in 00:00:00.
tt master
[Type Ctrl-D to exit or "Help." for help.]
# = Fatal error: exception Stack_overflow
Raised at file "pervasives.ml", line 149, characters 10-33
Called from file "beautify.ml", line 64, characters 14-41
Called from file "beautify.ml", line 53, characters 27-56
Called from file "beautify.ml", line 64, characters 14-41
Called from file "beautify.ml", line 53, characters 27-56
[...]
This time, we manage to reify the result, but encounter a new overflow in the part of the code that prepares the result for pretty-printing, more specifically and as expected, in the handling of the RHS of applications.
The same transformation method into CPS applies here again, as well as one step further in the pretty-printer code. Fixing each part of the code in this methodic way, if a bit repetitive, is rather quick.
The commits associated to the fixes described in this post can be found on Gabriel Scherer’s github repository.
Once this is done, we finally get a result (whose pretty-printing is not that interesting):
$ make native && echo "Eval power two (times four five)." | OCAMLRUNPARAM="b" ./tt.native -l church.tt ocamlbuild -tag annot -tag debug -lib unix -use-menhir tt.native
Finished, 38 targets (28 cached) in 00:00:00.
tt master
[Type Ctrl-D to exit or "Help." for help.]
# = fun A : Type 0 =>
fun x : A -> A =>
fun x0 : A =>
x
(x
(x
(x
(x
(x
(x
(x
(x
(x
(x
(x
(x
(x
(x
(x
(x
(x
(x
(x
(x
(x
(x
(x
(x
(x
(x (x (x (x (x (x (x (x (x (... (...))))))))))))))))))))))))))))))))))))
: forall A : Type 0, (A -> A) -> A -> A
# %