Gagallium : Bove-Capretta, Reloaded

The following construction is based upon a note scribbled by Conor McFermat^W McBride on a draft of my PhD thesis a few months ago. Upon seeing an earlier version of this file, Stevan Andjelkovic (aka Mr Freemonadic) urged me to borrow one of his free monads to write my recursive functions.
The tl;dr: we intensionally characterise a class of Bove-Capretta predicates (using a custom-made universe of datatypes) with the guarantee that these predicates are "collapsible", ie. have no run-time cost. We then write a generic program that computes such a predicate from the recursive definition of a function. Using this framework, we can write a recursive function without being bothered by that administrative termination-checker and prove its termination after the fact.
module BoveReloaded where

open import Level renaming (zero to zeroL ; suc to sucL)

open import Function

open import Data.Empty
open import Data.Unit hiding (_≤?_ ; decTotalOrder ; _≤_ )
open import Data.Product
open import Data.Sum
open import Data.Nat hiding (_≤_)
open import Data.Nat.Properties
open import Data.List
open import Data.List.All

open import Relation.Nullary
open import Relation.Unary
open import Relation.Binary
open import Relation.Binary.PropositionalEquality

open import Induction
open import Induction.Nat as IndNat
open import Induction.Lexicographic

open import Category.Applicative
open import Category.Monad

-- * Universe of (collapsible) datatypes

{-

  In this file, we will *compute* (on the fly) the inductive
  family representing a given Bove-Capretta predicate. To this
  end, we need first-class inductive families, ie. inductives
  we can build from within the type theory. I'll achieve this
  by defining a *universe* of inductive families in Agda.

  A usual universe of datatypes
  would be closed under Σ-type: that's the one code that lets
  us store information in a datatype. However, because we want
  to erase our Bove-Capretta witnesses at run-time, we must
  make sure that their inductive definition has no
  computational content by forbidding the Σ-code (and its
  variants). We shall call the datatypes thus described the
  "collapsible datatypes", based on the idea that they have no
  computational content. This terminology (and claim) will be
  formally justified in the last Section.

  We thus define a set of codes :
-}

data CDesc (I : Set) : Set₁ where
  `0 : CDesc I
  `1 : CDesc I
  `X : (i : I) → CDesc I
  _`×_ : (A B : CDesc I) → CDesc I
  `Π : (S : Set)(T : S → CDesc I) → CDesc I

{-
  Followed by their interpretation, which builds functors from
  Set/I to Set:
-}

⟦_⟧ : {I : Set} → CDesc I → (I → Set) → Set
⟦ `0 ⟧ X = ⊥
⟦ `1 ⟧ X = ⊤
⟦ `X i ⟧ X = X i
⟦ A `× B ⟧ X = ⟦ A ⟧ X × ⟦ B ⟧ X
⟦ `Π S T ⟧ X = (s : S) → ⟦ T s ⟧ X

{-
  We obtain the code of (collapsible) descriptions, which
  describe endofunctors on Set/I:
-}

record CFunc (I : Set) : Set₁ where
  constructor mk
  field
    func : I → CDesc I

{-
  From which we can define a generic least fixpoint operator :
-}

data μ {I : Set}(R : CFunc I)(i : I) : Set where
  con : ⟦ CFunc.func R i ⟧ (μ R) → μ R i

{-
  From there, we can also define induction over these
  structures, but we won't need it in this file.
-}

-- * The recursion oracle:

{-

  Now, the game is to provide a domain-specific language for
  the programmer to write a recursive function and prove,
  after the fact, that the function is indeed well-defined.

-}

module RecMonad (A : Set)(B : A → Set) where

{-
  To describe a recursive function of type [(a : A) → B a], we
  take the free monad of the signature [call : (a : A) → B a]
-}

  data RecMon (X : Set) : Set where
    callRec : (a : A)(rec : B a → RecMon X) → RecMon X
    returnRec : (x : X) → RecMon X

  recMon : RawMonad RecMon
  recMon = record { return = returnRec
                  ; _>>=_ = _>>=_ }
       where  _>>=_ : ∀{X Y : Set} → RecMon X → (X → RecMon Y) → RecMon Y
              returnRec x >>= f = f x
              callRec a rec >>= f = callRec a (λ b → (rec b) >>= f)

{-
  The type recProg captures those functions defined with the
  recursion oracle :
-}

recProg  : (A : Set)(B : A → Set) → Set
recProg A B = (a : A) → RecMonad.RecMon A B (B a)

{-
  Besides the monadic structure of recProg, we can call the
  oracle with the call⟨_⟩ operation :
-}

call⟨_⟩ : {A : Set}{B : A → Set} → recProg A B
call⟨ a ⟩ = RecMonad.callRec a return
  where open RawMonad (RecMonad.recMon _ _)

{-
  Intuitively, the call⟨_⟩ operation is our oracle, providing
  a (B a)-result to any A-query. We thus write our recursive
  programs by calling the oracle instead of doing a recursive
  call.
-}

-- ** Example 1: gcd

module Gcd where

  open RawMonad (RecMonad.recMon (ℕ × ℕ) (λ _ → ℕ))

  {-
    We implement gcd pretty much as usual, using the oracle in
    the recursive cases :
  -}

  gcd : recProg (ℕ × ℕ) (λ _ → ℕ)
  gcd (0  , n)     = return n
  gcd (m , 0)      = return m
  gcd (suc m , suc n) with m ≤? n
  ... | yes _ = call⟨ suc m , n ∸ m ⟩
  ... | no  _ = call⟨ m ∸ n , suc n ⟩

open Gcd

-- ** Example: fib

module Fib where

  open RawMonad (RecMonad.recMon ℕ (λ _ → ℕ))

  {-
    We can also chain recursive calls, as per the monadic
    structure. For example, we can write the naïve Fibonacci
    function:
  -}

  fib : recProg ℕ (λ _ → ℕ)
  fib zero = return 0
  fib (suc zero) = return 1
  fib (suc (suc n)) = call⟨ suc n ⟩ >>= λ r₁ →
                      call⟨ n ⟩ >>= λ r₂ →
                      return (r₁ + r₂)

open Fib

-- * Generic Bove-Capretta predicate generator:

{-

  From a recProg, we can build a Bove-Capretta predicate
  that, intuitively, is merely the reification (as an
  inductive family) of the call-graph of the recursive
  program.

-}

module BC
  -- Let:
  {A : Set}
  {B : A → Set}
  (prog : recProg A B)
    where

  open RecMonad A B

  {-
    As it turns out, this call-graph is always a collapsible
    predicate: to "prove" this, we simply describe the
    Bove-Capretta predicate with a collapsible description
    Dom:
  -}

  dom : ∀{a} → RecMon (B a) → CDesc A
  dom (returnRec z) = `1
  dom (callRec a rec) = `X a `× `Π (B a) λ b → dom (rec b)

  Dom : CFunc A
  Dom = CFunc.mk λ a → dom (prog a)

  {-
    Then, following the Bove-Capretta technique, we can run
    the (potentially, general-recursive) function "prog" by
    recursion over its call-graph (and, therefore, not over
    its argument a):
  -}

  mutual
    run : (a : A) → μ Dom a → B a
    run a (con domS) = run1 (prog a) (mapRun {p = prog a} domS)

    mapRun : ∀{a}{p : RecMon (B a)} → ⟦ dom p ⟧ (μ Dom) → ⟦ dom p ⟧ B
    mapRun {p = returnRec x} tt = tt
    mapRun {p = callRec a rec} (domA , domRec) =
      run a domA , λ b → mapRun {p = rec b} (domRec b)

    run1 : ∀{a} → (p : RecMon (B a)) → ⟦ dom p ⟧ B → B a
    run1 (returnRec b) tt = b
    run1 (callRec a rec) (b , domRec) = run1 (rec b) (domRec b)

  {-
    Note that we are *not* using the elements of μ Dom s in a
    computationally-relevant way: they are only here to convince Agda
    that the definition (trivially) terminates.

    [Question: could we use Agda's irrelevance modality to make that
     clear in this file?]

    In fact, we know for sure that these elements cannot be
    computationally-relevant: being collapsible, there is
    nothing in μ Dom to compute with! At run-time, Edwin would
    just ignore it.
  -}

-- * Example: Gcd

-- ** Naive implementation

{-
  Naively, one could implement gcd as follows:

gcd : (m n : ℕ) → ℕ
gcd 0 n = n
gcd m 0 = m
gcd (suc m) (suc n) with m ≤? n
gcd (suc m) (suc n) | yes p = gcd (suc m) (n ∸ m)
gcd (suc m) (suc n) | no ¬p = gcd (m ∸ n) (suc n)

  But you need the 'Trust-me-I-am-a-doctor' flag to convince Agda
  that, well, you know what you're doing:

{-# OPTIONS --no-termination-check #-}

module Test where
  test1 : gcd 4 5 ≡ 1
  test1 = refl

  test2 : gcd 30 35 ≡ 5
  test2 = refl

  test3 : gcd 70 105 ≡ 35
  test3 = refl
-}

-- ** Using Bove-Capretta

{-

  Disclaimer: For the curious, check
  Data.Nat.GCD
  in Nisse's standard library for the "proper" way to
  implement GCD, purely by induction.  Also, have a sip of
  Conor's 'Djinn, monotonic'
  for an introduction to the framework at work in Nisse's
  code.

  A (perhaps?) less elegant way to implement the GCD function
  consists in using the above Bove-Capretta construction. The
  benefit of this approach is that you don't need at least
  2 Phds to make sense of it (it took me one, your mileage may
  vary).

  Applying our generic machinery to the recursive definition
  of gcd, we obtain the Bove-Capretta predicate:

-}

DomGCD : ℕ × ℕ → Set
DomGCD (m , n) = μ (BC.Dom gcd) (m , n)

{-
  And, still applying our generic machinery, we get that, for
  any two input numbers satisfying the Bove-Capretta
  predicate, we can compute their gcd:
-}

gcd-bove : (m n : ℕ) → DomGCD (m , n) → ℕ
gcd-bove m n xs = BC.run gcd (m , n) xs

-- *** GCD is general recursive:

{-
  Now, we can get rid of that pesky DomGCD predicate by
  proving, after the fact, that our gcd function is indeed
  terminating. For that, we simply have to prove that DomGCD
  is inhabited for any input numbers m and n :
-}

gcd-wf : (m n : ℕ) → DomGCD (m , n)
gcd-wf m n = build ([_⊗_] IndNat.<-rec-builder IndNat.<-rec-builder)
                   (λ { (m , n) → DomGCD (m , n) })
                   (λ { (m , n) rec → con (ih m n rec) })
                   (m , n)
  where ih : ∀ x y → (IndNat.<-Rec ⊗ IndNat.<-Rec) DomGCD (x , y) → ⟦ BC.dom gcd (gcd (x , y)) ⟧ DomGCD
        ih zero y rec = tt
        ih (suc x) zero rec = tt
        ih (suc x) (suc y) rec with x ≤? y
        ih (suc x) (suc y) (rec-x , rec-y) | yes p = (rec-x (y ∸ x) (s≤′s (≤⇒≤′ (n∸m≤n x y)))) , (λ _ → tt)
        ih (suc x) (suc y) (rec-x , rec-y) | no ¬p = rec-y ((x ∸ y)) ((s≤′s (≤⇒≤′ (n∸m≤n y x)))) (suc y) , (λ _ → tt)

{-
  And we get the desired gcd function:
-}

gcd' : (m n : ℕ) → ℕ
gcd' m n = gcd-bove m n (gcd-wf m n)

module TestGcd where
  test0 : gcd' 0 5 ≡ 5
  test0 = refl

  test0' : gcd' 4 0 ≡ 4
  test0' = refl

  test1 : gcd' 4 5 ≡ 1
  test1 = refl

  test2 : gcd' 30 35 ≡ 5
  test2 = refl

  test3 : gcd' 70 105 ≡ 35
  test3 = refl

-- * Example: Fibonacci

{-
  The same goes for the Fibonacci function, so I'll be
  mercilessly brief.
-}

DomFib : ℕ → Set
DomFib m = μ (BC.Dom fib) m

fib-wf : (n : ℕ) → DomFib n
fib-wf a = cRec (λ n → DomFib n) (λ n r → con (ih {n} r)) a
  where ih : ∀{n} → IndNat.CRec DomFib n → ⟦ BC.dom fib (fib n) ⟧ DomFib
        ih {zero} ih = tt
        ih {suc zero} ih = tt
        ih {suc (suc n)} (fib-sn , fib-n , xs) = fib-sn , λ n' → fib-n , λ _ → tt

fib' : ℕ → ℕ
fib' n = BC.run fib n (fib-wf n)

module TestFib where
  fib'0 : fib' 0 ≡ 0
  fib'0 = refl

  fib'1 : fib' 1 ≡ 1
  fib'1 = refl

  fib'2 : fib' 2 ≡ 1
  fib'2 = refl

  fib'3 : fib' 3 ≡ 2
  fib'3 = refl

  fib'4 : fib' 4 ≡ 3
  fib'4 = refl

  fib'5 : fib' 5 ≡ 5
  fib'5 = refl


-- * Collapse

{-

  This is all very well but we've traded the freedom from
  termination checking for the burden of carrying
  Bove-Capretta around: how much of our foot do we have left?
  Well, in fact, once the dust settles, we should have as many
  toes as before and that's thanks to the "collapsibility" of
  the CDesc universe.

  In "Inductive Families Need Not Store Their Indices",
  Edwin Brady, Conor McBride, and James McKinna describe
  a *run-time* optimisation called "collapsing" (Section 6):

  An inductive family D : I → Set is *collapsible* if
    for every index i,
        if a, b : D i, then a ≡ b (extensionally)

  That is, the index i determines entirely the content of the
  inductive family. Put otherwise, the inductive family has no
  computational content, hence the name 'collapsible':
  morally, it collapses to a single element.

  Remark:

  In the lingo of Homotopy Type Theory, a collapsible type
  D : I → Set corresponds to a family of
  h-propositions,
  ie. we have:

     ∀ i → isProp(D i)

  That is:
     ∀ i → ∀ (x y : D i) → x ≡ y

-}

-- ** Example: ≤ relation (Section 6)

data _≤`_ : ℕ → ℕ → Set where
  le0 : ∀{n} → 0 ≤` n
  leS : ∀{m n} → m ≤` n → suc m ≤` suc n

-- This datatype is collapsible:

≤-collapsible : ∀{m n} → (a b : m ≤` n) → a ≡ b
≤-collapsible {zero} le0 le0 = refl
≤-collapsible {suc m} {zero} () b
≤-collapsible {suc m} {suc n} (leS a) (leS b) rewrite ≤-collapsible a b = refl


-- ** CFunc describes collapsible datatypes :

{-

  Assuming extensionality (by working in the Squash
  applicative ⊢_), we can prove (generically) that fixpoints
  of CDesc are indeed collapsible.

  Briefly, here is the Squash applicative:

-}

infixr 1 ⊢_
data ⊢_ {l : Level}(X : Set l) : Set l where
  pf : .X → ⊢ X

Applicative⊢ : RawApplicative ⊢_
Applicative⊢ = record { pure = pf
                      ; _⊛_ = _⊛_ }
    where _⊛_ : {A B : Set} → ⊢ (A → B) → ⊢ A → ⊢ B
          (pf f) ⊛ (pf a) = pf (f a)

postulate
  extensionality : {A : Set}{B : A → Set}{f g : (a : A) → B a} →
                   ((x : A) → ⊢ (f x ≡ g x)) → ⊢ (f ≡ g)

-- Using this machinery we can prove that any family R is
-- collapsible:

CDesc-collapse : ∀{I i}{R} → (xs ys : μ R i) → ⊢ xs ≡ ys
CDesc-collapse {I}{R = R} (con xs) (con ys) = cong con <$> help (CFunc.func R _) xs ys
  where open RawApplicative Applicative⊢

        help : (D : CDesc I) → (xs ys : ⟦ D ⟧ (μ R)) → ⊢ xs ≡ ys
        help `0 () _
        help `1 tt tt = pf refl
        help (`X i) (con xs₁) (con ys₁) = cong con <$> help (CFunc.func R i) xs₁ ys₁
        help (D₁ `× D₂) (xs₁ , xs₂) (ys₁ , ys₂) = cong₂ _,_ <$> help D₁ xs₁ ys₁ ⊛ help D₂ xs₂ ys₂
        help (`Π S T) f g = extensionality λ s → help (T s) (f s) (g s)

{-

  Edwin's
  compiler
  should therefore be able to optimise away our Bove-Capretta
  predicates away (at run-time only!).

-}


-- * Conclusion:

{-

  Having implemented gcd with a recursion oracle (and having
  faith in Edwin's optimiser), we are guaranteed that gcd
  suffers no run-time cost associated with the construction or
  consumption of the Bove-Capretta predicate we've generically
  constructed.

  Remark: This may be more than a fancy trick. In order to
  mark the accessibility predicates as 'computationally
  irrelevant', Coq users define their accessibility predicates
  in Prop. The system then checks that these definitions are
  'singleton types', ie. have a single constructor (hey,
  that's our notion of 'collapsible', isn't it?). However, to
  be usable on the Set (computationally-relevant) side, Coq
  has to support the elimination of these 'singleton types'
  into Set. The benefit is that, being in Prop, the
  accessibility predicate is simply not extracted by Coq. The
  inconvenient is the loss of strong normalisation of the
  calculus (see Letouzey's PhD, 'L'elimination des
  singletons', p.61): we have to restrict ourselves to weak
  reductions, ie. no reduction can happen under a lambda.

  Using our more structural approach (extracting a class of
  datatypes that can be automatically ignored), we may be able
  to have an OTT-style Prop (where only ⊥ : Prop can be
  eliminated over) while paying no run-time cost for gcd's
  collapsible argument. That's a strongly normalising calculus
  for you, at no expense.

-}