## Using Coq's evaluation mechanisms in anger

- June 23, 2015

Coq offers several internal evaluation mechanisms that have many uses, from glorious proofs by reflection to mundane testing of functions. Read on for an account of my quest to get large parts of the CompCert verified C compiler to execute from within Coq. It was a mighty battle between the forces of transparency and opacity, with the innocent-looking “decide equality” tactic as the surprise villain…

Recently, I spent significant time trying to execute parts of the CompCert verified C compiler directly within Coq. The normal execution path for CompCert is to extract (using Coq’s extraction mechanism) OCaml code from the Coq function definitions of CompCert, then compile this OCaml code and link it with hand-written OCaml code. Executing Coq definitions from within Coq itself bypasses extraction and provides a more lightweight way to run them on sample inputs.

That’s the theory; in practice, I ran into many problems, some specific to CompCert, others of a more general nature. In this post, I describe some of the “gotcha’s” I ran into. Most of this material is old news for expert Coq users, but writing it down could help others avoiding the pitfalls of Coq’s execution mechanisms.

### Coq’s evaluation mechanisms

Readers unfamiliar with Coq may wonder why a proof assistant should be able to evaluate functional programs at all. Owing to the conversion rule (types/propositions are identified up to reductions), evaluation is an integral part of Coq’s logic. The Coq implementation provides several evaluation mechanisms:

`compute`

, an interpreter that supports several evaluation strategies (lazy, call-by-value, etc).`vm_compute`

, which relies on compilation to the bytecode of a virtual machine, implements call-by-value evaluation, and delivers performance comparable to that of the OCaml bytecode compiler.`native_compute`

, introduced in the upcoming 8.5 release of Coq, relies on the OCaml native-code compiler to produce even higher performance.

Here is an example of evaluation:

```
Require Import Zarith.
Open Scope z_scope.
Fixpoint fib (n: nat) : Z :=
match n with 0%nat => 1 | 1%nat => 1 | S (S n2 as n1) => fib n1 + fib n2 end.
Compute (fib 30%nat).
```

The Coq toplevel prints `= 1346269 : Z`

, taking about 0.2s
for the evaluation. The `Compute`

command is shorthand for
`Eval vm_compute in`

, and therefore uses the virtual machine
evaluator. We can also use the interpreter instead, obtaining the same
results, only slower:

```
Eval cbv in (fib 30%nat). (* takes 2.3s *)
Eval lazy in (fib 30%nat). (* takes 16s *)
```

On the “ML subset” of Coq, like the example above, Coq’s evaluators behave exactly as one would expect. This “ML subset” comprises non-dependent data types and plain function definitions. Other features of Coq are more problematic evaluation-wise, as we now illustrate.

### Beware opacity!

The most common source of incomplete evaluations is opaque names. An
obvious example is names that are declared using `Parameter`

or `Axiom`

but not given a definition. Consider:

```
Parameter oracle: bool.
Compute (if oracle then 2+2 else 3+3).
(* = if oracle then 4 else 6 : Z *)
```

Coq is doing its best here, evaluating the `then`

and
`else`

branches to 4 and 6, respectively. Since it does not
know the value of the `oracle`

Boolean, it is of course
unable to reduce the `if`

entirely.

A less obvious source of opacity is definitions conducted in
interactive proof mode and terminated by `Qed`

:

```
Definition nat_eq (x y: nat) : {x=y} + {x<>y}.
Proof. decide equality. Qed.
Compute (nat_eq 2 2).
(* = nat_eq 2 2 : {2%nat = 2%nat} + {2%nat <> 2%nat} *)
```

`Qed`

always creates opaque definitions. To obtain a
transparent definition that evaluates properly, the proof script must be
terminated by `Defined`

instead:

```
Definition nat_eq (x y: nat) : {x=y} + {x<>y}.
Proof. decide equality. Defined.
Compute (nat_eq 2 2).
(* = left eq_refl : {2%nat = 2%nat} + {2%nat <> 2%nat} *)
```

The `Print Assumptions`

command can be used to check for
opaque names. However, as we see later, an opaque name does not always
prevent full evaluation, and opaque definitions are sometimes preferable
to transparent ones.

Another source of opacity, of the “after the fact” kind, is the
`Opaque`

command, which makes opaque a previous transparent
definition. Its effect can be undone with the `Transparent`

command. The virtual machine-based evaluator ignores opacity coming from
`Opaque`

, but the interpreter-based evaluator honors it:

```
Definition x := 2.
Opaque x.
Compute (x + x). (* = 4 : Z *)
Eval cbv in (x + x). (* = huge useless term *)
```

### Dependently-typed data structures

A very interesting feature of Coq, from a functional programming standpoint, is its support for dependently-typed data structures, containing both data and proofs of properties about them. Such data structures can be a challenge for evaluation: intuitively, we want to evaluate the data parts fully, but may not want to evaluate the proof parts fully (because proof terms can be huge and their evaluation takes too much time), or may not be able to evaluate the proof parts fully (because they use theorems that were defined opaquely).

A paradigmatic example of dependently-typed data structure is the
subset type `{ x : A | P x }`

, which is shorthand for
`sig A P`

and defined in the Coq standard library as:

```
Inductive sig (A:Type) (P:A -> Prop) : Type :=
exist : forall x:A, P x -> sig P.
```

Intuitively, terms of type `{ x : A | P x }`

are pairs of
a value of type `A`

and of a proof that this value satisfies
the predicate `P`

.

Let us use a subset type to work with integers greater than 1:

```
Definition t := { n: Z | n > 1 }.
Program Definition two : t := 2.
Next Obligation. omega. Qed.
Program Definition succ (n: t) : t := n + 1.
Next Obligation. destruct n; simpl; omega. Qed.
```

The `Program`

facility that we used above makes it easy to
work with subset types: to build terms of such types, the programmer
specifies the data part (e.g. `2`

or `n + 1`

above), and the proof parts are left as proof obligations, which can be
solved using proof scripts. There are other ways, for example by writing
proof terms by hand:

```
Definition two : t := exist _ 2 (refl_equal Gt).
```

or by using interactive proof mode for the whole term:

```
Definition two : t.
Proof. exists 2. omega. Defined.
```

But how well does this compute? Let’s compute
`succ two`

:

```
Compute (succ two).
(* = exist (fun n : Z => n > 1) 3
(succ_obligation_1 (exist (fun n : Z => n > 1) 2 two_obligation_1)) : t *)
```

This is not too bad: the value part of the result was completely
evaluated to `3`

, while the proof part got stuck on the
opaque lemmas introduced by `Program`

. The reason why this is
not too bad is that, often, subset types are used locally to transport
invariants on data structures, but the final result we are interested in
is just the data part of the subset type, as obtained with the
`proj1_sig`

projection.

```
Compute (proj1_sig (succ two)).
(* = 3 : Z *)
```

In other words, the proof parts of values of type `t`

are
carried around during computation, but do not contribute to the final
result obtained with `proj1_sig`

. So, it’s not a problem to
have opaque names in the proof parts. Indeed, making these names
transparent (e.g. by using `Defined`

instead of
`Qed`

to terminate `Next Obligation`

) just creates
bigger proof terms that will be discarded eventually anyway.

Another classic example of dependent data type is the type
`{P} + {Q}`

of informative Booleans. Values of this type are
either the `left`

constructor carrying a proof of
`P`

, or the `right`

constructor carrying a proof
of `Q`

. A typical use is for decidable equality functions
that return not just a Boolean “equal/not equal”, but also a proof of
equality or disequality. For example, here is a decidable equality for
the subset type `t`

above:

```
Require Import Eqdep_dec.
Program Definition t_eq (x y: t) : {x=y} + {x<>y} :=
if Z.eq_dec (proj1_sig x) (proj1_sig y) then left _ else right _.
Next Obligation.
destruct x as [x Px], y as [y Py]. simpl in H; subst y.
f_equal. apply UIP_dec. decide equality.
Qed.
Next Obligation.
red; intros; elim H; congruence.
Qed.
```

Again, such definitions compute relatively well:

```
Compute (t_eq two two).
(* = left
(t_eq_obligation_1 (exist (fun n : Z => n > 1) 2 eq_refl)
(exist (fun n : Z => n > 1) 2 eq_refl) eq_refl)
: {two = two} + {two <> two} *)
```

The proof part blocks, again, on an opaque name, but, more
importantly, evaluation went far enough to determine the head
constructor `left`

, meaning that the two arguments are equal.
Typically, we use decidable equality functions like `t_eq`

in
the context of an `if`

expression that just looks at the head
constructor and discards the proof parts:

```
Compute (if t_eq two two then 1 else 2).
(* = 1 : Z *)
```

Bottom line: dependently-typed data structures such as subset types
or rich Booleans compute quite well indeed, even if their proof parts
are defined opaquely. This is due to a *phase distinction* that
functions over those types naturally obey: the data part of the result
depends only on the data part of the argument, while the proof part of
the argument is used only to produce the proof part of the result.
Consider again the `succ`

function above, with type

```
succ: { x : Z | x > 1 } -> { y : Z | y > 1 }
```

The `y`

data part of the result depends only on the
`x`

data part of the argument, via `y = x + 1`

.
The `x > 1`

proof part of the argument contributes only to
proving the `y > 1`

part of the result.

### The odd “decide equality” tactic

At first sight, the phase distinction outlined above is a natural
consequence of Coq’s sorting rules, which, to a first approximation,
prevent a term of sort `Type`

to depend on a term of sort
`Prop`

. But there are exceptions to this sorting rule, which
result in completely mysterious failures of evaluation. As I learned
through painful debugging sessions, the `decide equality`

tactic violates the phase distinction in a mysterious way.

Consider a decidable equality over the type `list t`

,
autogenerated by `decide equality`

:

```
Definition t_list_eq: forall (x y: list t), {x=y} + {x<>y}.
Proof. decide equality. apply t_eq. Defined.
Compute (if t_list_eq (two::nil) (two::nil) then 1 else 2).
(* = if match
match
t_eq_obligation_1
(exist (fun n : Z => n > 1) 2 two_obligation_1)
(exist (fun n : Z => n > 1) 2 two_obligation_1) eq_refl
in (_ = x)
...
with
| eq_refl => left eq_refl
end
then 1
else 2 : Z *)
```

Whazzat? The normal form is 40 lines long! Clearly,
`t_list_eq (two::nil) (two::nil)`

failed to reduce to
`left`

of some equality proof. Apparently, it got stuck on
the opaque proof `t_eq_obligation_1`

before reaching the
point where it can decide between `left`

(equal) and
`right`

(not equal). But that violates the phase distinction!
The `left`

/`right`

data part of the result should
not depend on the proof term `t_eq_obligation_1`

!

Something fishy is going on. But maybe we can circumvent it by using
`Defined`

instead of `Qed`

in the proof
obligations of `t_eq`

? Doing so only delays failure:
computation goes further but produces a 200-line term that is blocked on
the opaque lemma `UIP_dec`

from Coq’s standard library… I
played this “whack-a-mole” game for hours, copying parts of the Coq
standard library to make lemmas more transparent, in the hope that
functions produced by `decide equality`

will compute
eventually.

Then I realized that the problem lies with
`decide equality`

. The term it produces is roughly the same
one would get with the following proof script:

```
Definition bad_t_list_eq: forall (x y: list t), {x=y} + {x<>y}.
Proof.
induction x as [ | xh xt]; destruct y as [ | yh yt].
- left; auto.
- right; congruence.
- right; congruence.
- destruct (t_eq xh yh).
+ subst yh. (* HERE IS THE PROBLEM *)
destruct (IHxt yt).
* left; congruence.
* right; congruence.
+ right; congruence.
Defined.
```

Notice the `subst`

in the first `+`

bullet? In
the case where `x`

and `y`

are not empty and their
heads are equal, it eliminates the proof of equality between the heads
before recursing over the tails and finally deciding whether to produce
a `left`

or a `right`

. This makes the
`left`

/`right`

data part of the final result
dependent on a proof term, which in general does not reduce!

In this particular example of lists, and in all cases involving
ML-like data types, this early elimination of an equality proof is
useless: if we just remove the `subst xh`

, we get a perfectly
good decidable equality that respects the phase distinction and computes
just fine:

```
Definition good_t_list_eq: forall (x y: list t), {x=y} + {x<>y}.
Proof.
induction x as [ | xh xt]; destruct y as [ | yh yt].
- left; auto.
- right; congruence.
- right; congruence.
- destruct (t_eq xh yh).
+ destruct (IHxt yt).
* left; congruence.
* right; congruence.
+ right; congruence.
Defined.
Compute (if (good_list_eq (two::nil) (two::nil)) then 1 else 2).
(* = 1 : Z *)
```

The only case where the current behavior of
`decide equality`

would be warranted is for dependently-typed
data types like the following:

```
Inductive bitvect : Type := Bitvect (n: nat) (v: Vector.t bool n).
```

When comparing two values of `bitvect`

type, after
checking that their `n`

components are equal, we must
substitute one `n`

by the other so that the two
`v`

components have the same type and can be compared in
turn. However, `decide equality`

just does not work on the
`bitvect`

type above, producing an ill-typed term… So much
for my sympathetic explanation of the odd behavior of
`decide equality`

!

### Epilogue

After reimplementing `decide equality`

in 20 lines of Ltac
that generate phase-distinction-correct functions (thank you very much),
and performing a zillion other changes in CompCert’s Coq sources, I was
finally able to execute whole CompCert passes from within Coq. If you
are wondering about performance, Coq’s VM runs CompCert at 2/3 of the
speed of CompCert extracted to OCaml then compiled to bytecode, and 15%
of the speed of the “real” CompCert, extracted to OCaml then compiled to
native code. Happy end for a terrible hack!