Coq offers several internal evaluation mechanisms that have many uses, from glorious proofs by reflection to mundane testing of functions. Read on for an account of my quest to get large parts of the CompCert verified C compiler to execute from within Coq. It was a mighty battle between the forces of transparency and opacity, with the innocent-looking "decide equality" tactic as the surprise villain...

Recently, I spent significant time trying to execute parts of the CompCert verified C compiler directly within Coq. The normal execution path for CompCert is to extract (using Coq's extraction mechanism) OCaml code from the Coq function definitions of CompCert, then compile this OCaml code and link it with hand-written OCaml code. Executing Coq definitions from within Coq itself bypasses extraction and provides a more lightweight way to run them on sample inputs.

That's the theory; in practice, I ran into many problems, some specific to CompCert, others of a more general nature. In this post, I describe some of the "gotcha's" I ran into. Most of this material is old news for expert Coq users, but writing it down could help others avoiding the pitfalls of Coq's execution mechanisms.

Coq's evaluation mechanisms

Readers unfamiliar with Coq may wonder why a proof assistant should be able to evaluate functional programs at all. Owing to the conversion rule (types/propositions are identified up to reductions), evaluation is an integral part of Coq's logic. The Coq implementation provides several evaluation mechanisms:

  • compute, an interpreter that supports several evaluation strategies (lazy, call-by-value, etc).
  • vm_compute, which relies on compilation to the bytecode of a virtual machine, implements call-by-value evaluation, and delivers performance comparable to that of the OCaml bytecode compiler.
  • native_compute, introduced in the upcoming 8.5 release of Coq, relies on the OCaml native-code compiler to produce even higher performance.

Here is an example of evaluation:

Require Import Zarith.
Open Scope z_scope.

Fixpoint fib (n: nat) : Z :=
  match n with 0%nat => 1 | 1%nat => 1 | S (S n2 as n1) => fib n1 + fib n2 end.

Compute (fib 30%nat).

The Coq toplevel prints = 1346269 : Z, taking about 0.2s for the evaluation. The Compute command is shorthand for Eval vm_compute in, and therefore uses the virtual machine evaluator. We can also use the interpreter instead, obtaining the same results, only slower:

Eval cbv  in (fib 30%nat).   (* takes 2.3s *)
Eval lazy in (fib 30%nat).   (* takes 16s *)

On the "ML subset" of Coq, like the example above, Coq's evaluators behave exactly as one would expect. This "ML subset" comprises non-dependent data types and plain function definitions. Other features of Coq are more problematic evaluation-wise, as we now illustrate.

Beware opacity!

The most common source of incomplete evaluations is opaque names. An obvious example is names that are declared using Parameter or Axiom but not given a definition. Consider:

Parameter oracle: bool.
Compute (if oracle then 2+2 else 3+3).
    (* = if oracle then 4 else 6 : Z *)

Coq is doing its best here, evaluating the then and else branches to 4 and 6, respectively. Since it does not know the value of the oracle Boolean, it is of course unable to reduce the if entirely.

A less obvious source of opacity is definitions conducted in interactive proof mode and terminated by Qed:

Definition nat_eq (x y: nat) : {x=y} + {x<>y}.
Proof. decide equality. Qed.

Compute (nat_eq 2 2).
    (* = nat_eq 2 2 : {2%nat = 2%nat} + {2%nat <> 2%nat} *)

Qed always creates opaque definitions. To obtain a transparent definition that evaluates properly, the proof script must be terminated by Defined instead:

Definition nat_eq (x y: nat) : {x=y} + {x<>y}.
Proof. decide equality. Defined.

Compute (nat_eq 2 2).
    (* = left eq_refl : {2%nat = 2%nat} + {2%nat <> 2%nat} *)

The Print Assumptions command can be used to check for opaque names. However, as we see later, an opaque name does not always prevent full evaluation, and opaque definitions are sometimes preferable to transparent ones.

Another source of opacity, of the "after the fact" kind, is the Opaque command, which makes opaque a previous transparent definition. Its effect can be undone with the Transparent command. The virtual machine-based evaluator ignores opacity coming from Opaque, but the interpreter-based evaluator honors it:

Definition x := 2.
Opaque x.
Compute (x + x).      (* = 4 : Z *)
Eval cbv in (x + x).  (* = huge useless term *)

Dependently-typed data structures

A very interesting feature of Coq, from a functional programming standpoint, is its support for dependently-typed data structures, containing both data and proofs of properties about them. Such data structures can be a challenge for evaluation: intuitively, we want to evaluate the data parts fully, but may not want to evaluate the proof parts fully (because proof terms can be huge and their evaluation takes too much time), or may not be able to evaluate the proof parts fully (because they use theorems that were defined opaquely).

A paradigmatic example of dependently-typed data structure is the subset type { x : A | P x }, which is shorthand for sig A P and defined in the Coq standard library as:

Inductive sig (A:Type) (P:A -> Prop) : Type :=
    exist : forall x:A, P x -> sig P.

Intuitively, terms of type { x : A | P x } are pairs of a value of type A and of a proof that this value satisfies the predicate P.

Let us use a subset type to work with integers greater than 1:

Definition t := { n: Z | n > 1 }.

Program Definition two : t := 2.
Next Obligation. omega. Qed.

Program Definition succ (n: t) : t := n + 1.
Next Obligation. destruct n; simpl; omega. Qed.

The Program facility that we used above makes it easy to work with subset types: to build terms of such types, the programmer specifies the data part (e.g. 2 or n + 1 above), and the proof parts are left as proof obligations, which can be solved using proof scripts. There are other ways, for example by writing proof terms by hand:

Definition two : t := exist _ 2 (refl_equal Gt).

or by using interactive proof mode for the whole term:

Definition two : t.
Proof. exists 2. omega. Defined.

But how well does this compute? Let's compute succ two:

Compute (succ two).
   (*  = exist (fun n : Z => n > 1) 3
            (succ_obligation_1 (exist (fun n : Z => n > 1) 2 two_obligation_1)) : t *)

This is not too bad: the value part of the result was completely evaluated to 3, while the proof part got stuck on the opaque lemmas introduced by Program. The reason why this is not too bad is that, often, subset types are used locally to transport invariants on data structures, but the final result we are interested in is just the data part of the subset type, as obtained with the proj1_sig projection.

Compute (proj1_sig (succ two)).
   (*  = 3 : Z *)

In other words, the proof parts of values of type t are carried around during computation, but do not contribute to the final result obtained with proj1_sig. So, it's not a problem to have opaque names in the proof parts. Indeed, making these names transparent (e.g. by using Defined instead of Qed to terminate Next Obligation) just creates bigger proof terms that will be discarded eventually anyway.

Another classic example of dependent data type is the type {P} + {Q} of informative Booleans. Values of this type are either the left constructor carrying a proof of P, or the right constructor carrying a proof of Q. A typical use is for decidable equality functions that return not just a Boolean "equal/not equal", but also a proof of equality or disequality. For example, here is a decidable equality for the subset type t above:

Require Import Eqdep_dec.

Program Definition t_eq (x y: t) : {x=y} + {x<>y} :=
  if Z.eq_dec (proj1_sig x) (proj1_sig y) then left _ else right _.
Next Obligation.
  destruct x as [x Px], y as [y Py]. simpl in H; subst y.
  f_equal. apply UIP_dec. decide equality.
Next Obligation.
  red; intros; elim H; congruence.

Again, such definitions compute relatively well:

Compute (t_eq two two).
    (* = left
         (t_eq_obligation_1 (exist (fun n : Z => n > 1) 2 eq_refl)
            (exist (fun n : Z => n > 1) 2 eq_refl) eq_refl)
       : {two = two} + {two <> two} *)

The proof part blocks, again, on an opaque name, but, more importantly, evaluation went far enough to determine the head constructor left, meaning that the two arguments are equal. Typically, we use decidable equality functions like t_eq in the context of an if expression that just looks at the head constructor and discards the proof parts:

Compute (if t_eq two two then 1 else 2).
    (* = 1 : Z *)

Bottom line: dependently-typed data structures such as subset types or rich Booleans compute quite well indeed, even if their proof parts are defined opaquely. This is due to a phase distinction that functions over those types naturally obey: the data part of the result depends only on the data part of the argument, while the proof part of the argument is used only to produce the proof part of the result. Consider again the succ function above, with type

   succ: { x : Z | x > 1 } -> { y : Z | y > 1 }

The y data part of the result depends only on the x data part of the argument, via y = x + 1. The x > 1 proof part of the argument contributes only to proving the y > 1 part of the result.

The odd "decide equality" tactic

At first sight, the phase distinction outlined above is a natural consequence of Coq's sorting rules, which, to a first approximation, prevent a term of sort Type to depend on a term of sort Prop. But there are exceptions to this sorting rule, which result in completely mysterious failures of evaluation. As I learned through painful debugging sessions, the decide equality tactic violates the phase distinction in a mysterious way.

Consider a decidable equality over the type list t, autogenerated by decide equality:

Definition t_list_eq: forall (x y: list t), {x=y} + {x<>y}.
Proof. decide equality. apply t_eq. Defined.

Compute (if t_list_eq (two::nil) (two::nil) then 1 else 2).
(*   = if match
                (exist (fun n : Z => n > 1) 2 two_obligation_1)
                (exist (fun n : Z => n > 1) 2 two_obligation_1) eq_refl
              in (_ = x)
          | eq_refl => left eq_refl
       then 1
       else 2 : Z *)

Whazzat? The normal form is 40 lines long! Clearly, t_list_eq (two::nil) (two::nil) failed to reduce to left of some equality proof. Apparently, it got stuck on the opaque proof t_eq_obligation_1 before reaching the point where it can decide between left (equal) and right (not equal). But that violates the phase distinction! The left/right data part of the result should not depend on the proof term t_eq_obligation_1!

Something fishy is going on. But maybe we can circumvent it by using Defined instead of Qed in the proof obligations of t_eq? Doing so only delays failure: computation goes further but produces a 200-line term that is blocked on the opaque lemma UIP_dec from Coq's standard library... I played this "whack-a-mole" game for hours, copying parts of the Coq standard library to make lemmas more transparent, in the hope that functions produced by decide equality will compute eventually.

Then I realized that the problem lies with decide equality. The term it produces is roughly the same one would get with the following proof script:

Definition bad_t_list_eq: forall (x y: list t), {x=y} + {x<>y}.
  induction x as [ | xh xt]; destruct y as [ | yh yt].
- left; auto.
- right; congruence.
- right; congruence.
- destruct (t_eq xh yh).
+ subst yh. (* HERE IS THE PROBLEM *)
  destruct (IHxt yt).
  * left; congruence.
  * right; congruence.
+ right; congruence.

Notice the subst in the first + bullet? In the case where x and y are not empty and their heads are equal, it eliminates the proof of equality between the heads before recursing over the tails and finally deciding whether to produce a left or a right. This makes the left/right data part of the final result dependent on a proof term, which in general does not reduce!

In this particular example of lists, and in all cases involving ML-like data types, this early elimination of an equality proof is useless: if we just remove the subst xh, we get a perfectly good decidable equality that respects the phase distinction and computes just fine:

Definition good_t_list_eq: forall (x y: list t), {x=y} + {x<>y}.
  induction x as [ | xh xt]; destruct y as [ | yh yt].
- left; auto.
- right; congruence.
- right; congruence.
- destruct (t_eq xh yh).
+ destruct (IHxt yt).
  * left; congruence.
  * right; congruence.
+ right; congruence.

Compute (if (good_list_eq (two::nil) (two::nil)) then 1 else 2).
    (* = 1 : Z *)

The only case where the current behavior of decide equality would be warranted is for dependently-typed data types like the following:

Inductive bitvect : Type := Bitvect (n: nat) (v: Vector.t bool n).

When comparing two values of bitvect type, after checking that their n components are equal, we must substitute one n by the other so that the two v components have the same type and can be compared in turn. However, decide equality just does not work on the bitvect type above, producing an ill-typed term... So much for my sympathetic explanation of the odd behavior of decide equality!


After reimplementing decide equality in 20 lines of Ltac that generate phase-distinction-correct functions (thank you very much), and performing a zillion other changes in CompCert's Coq sources, I was finally able to execute whole CompCert passes from within Coq. If you are wondering about performance, Coq's VM runs CompCert at 2/3 of the speed of CompCert extracted to OCaml then compiled to bytecode, and 15% of the speed of the "real" CompCert, extracted to OCaml then compiled to native code. Happy end for a terrible hack!