Gagallium : Formally verifying the complexity of OCaml programs with CFML -- interlude

This small interlude post is not about Coq nor CFML: I present a data structure from Okasaki’s Purely functional data structures, implemented in OCaml.

It comes as a preliminary explanation for an incoming formalization of it in CFML (see next post), including an complexity analysis!

The structure we decided to study is Okasaki’s “binary random access list”. It is a functional data structure, which features usual list operations (cons and uncons for adding and removing an element at the head of the list), but also random access lookup and update. That is, you can add or remove an element at the head of the list, but also modify or query the \(i^{th}\) element of the structure.

These four operations perform in worst-case \(O(log(n))\) steps, where \(n\) is the number of elements stored in the structure.

Let us see step by step how it is implemented in OCaml. The source code for the complete implementation can be found here.

Type definitions, implicit invariants

A binary random access list is a list of binary trees: we first define a tree OCaml type.

  type 'a tree = Leaf of 'a | Node of int * 'a tree * 'a tree

Notice that only the leaves store elements. Nodes contain an integer corresponding to the number of elements stored (in the leaves) in the tree, which makes a size function trivial to implement:

  let size = function
    | Leaf x -> 1
    | Node (w, _, _) -> w

Now, a binary random access list is a list of either a tree, either nothing. We consequently define an “tree option” type, here dubbed “digit”.

  type 'a digit = Zero | One of 'a tree

The name “digit” comes of the similarity between a binary random access list and a list of bits, representing an integer - adding an element at the head being similar to incrementing the integer, etc. We’ll see more of that later.

Finally, we define the type for the whole structure: the binary random access list.

  type 'a rlist = 'a digit list

A valid binary random access list should satisfy some additional invariants:

Trees are complete trees – a tree of depth \(d\) always has \(2^d\) leaves;
Any rlist contains trees of increasing depth starting at some depth \(p\): if the \(i^{th}\) cell (indexing from \(0\)) contains a tree (is of the form One t), then this tree has depth \(p+i\).
A complete binary random access list is a rlist with starting depth \(p\) equal to \(0\): its first tree, if present, is only a leaf.

To sum up, a binary random access list is a list, which stores in its \(i^{th}\) cell either nothing, or a complete binary tree of depth \(i\).

As an example, a binary random access list storing the sequence \(1, 2, 3, 4, 5\) can be represented as:

Binary random access lists look like integers in binary

As I mentioned before, a binary random access list is in fact quite similar to an integer represented in binary - i.e. as a list of bits.

Actually, if you erase the trees from the implementation (type 'a digit = Zero | One of 'a tree becomes type digit = Zero | One), you obtain an implementation of integers, represented as a list of bits (least significant bit at the head of the list); the cons operation being incrementing, the uncons one decrementing (lookup and update do not make much sense, though).

How do you increment an integer? You look at the least significant bit; if it’s a zero, you turn it into a one; if it’s a one, you turn it into zero, and recursively continue to increment, starting with the next bit.

“Consing” to a binary random access list is similar, except that we have to handle a bit more information: the elements of the structure, stored in the trees.

Instead of adding \(1\), we add a tree t (more precisely, a digit One t): if the first element of the list is Zero, we turn it into One t. If it’s a One t', we turn it into Zero, and recursively continue, but with a new tree: Node (size t + size t', t, t'). This corresponds to the link operation, which combines two trees of depth \(d\) into one of depth \(d+1\).

  let link t1 t2 =
    Node (size t1 + size t2, t1, t2)

The OCaml implementation follows:

  let rec cons_tree t = function
    | [] -> [One t]
    | Zero :: ts -> One t :: ts
    | One t' :: ts -> Zero :: cons_tree (link t t') ts

  let cons x ts =
    cons_tree (Leaf x) ts

The uncons operations follows the same idea: it’s just like decrementing, except that you also have to invert the link operation to obtain trees of smaller depth.

  let rec uncons_tree = function
    | [] -> raise Empty
    | [One t] -> (t, [])
    | One t :: ts -> (t, Zero :: ts)
    | Zero :: ts ->
      match uncons_tree ts with
      | Node (_, t1, t2), ts' -> (t1, One t2 :: ts')
      | _ -> assert false

  let head ts =
    match uncons_tree ts with
    | (Leaf x, _) -> x
    | _ -> assert false

  let tail ts =
    let (_,ts') = uncons_tree ts in ts'

Unconsing a rlist of starting depth \(p\) always returns a tree of depth \(p\) (or fails if the list is empty). In particular, as the binary access list manipulated by the user always starts with depth \(0\), we can assume in the implementation of head that the unconsed tree is a leaf.

Random access

Random access lookup and update are quite easy to implement.

The idea is to walk through the list; faced with a Node (w, l, r) tree, we know how much elements it contains: it is exactly w. Knowing the index \(i\) of the element we want to visit, we can compare it to w to know whether we should explore this tree (if \(i < w\)), or look for the element \(i - w\) in the rest of the list.

  let rec lookup i = function
    | [] -> raise (Invalid_argument "lookup")
    | Zero :: ts -> lookup i ts
    | One t :: ts ->
      if i < size t
      then lookup_tree i t
      else lookup (i - size t) ts

If we have to walk through the tree, we can also do this without performing an exhaustive exploration: by comparing the index to half the size of the tree, we can decide whether we should explore the left or right subtree.

  let rec lookup_tree i = function
    | Leaf x -> if i = 0 then x else raise (Invalid_argument "lookup")
    | Node (w, t1, t2) ->
      if i < w/2
      then lookup_tree i t1
      else lookup_tree (i - w/2) t2

The update function works in a similar fashion.

  let rec update_tree i y = function
    | Leaf x -> if i = 0 then Leaf y else raise (Invalid_argument "update")
    | Node (w, t1, t2) ->
      if i < w/2
      then Node (w, update_tree i y t1, t2)
      else Node (w, t1, update_tree (i - w/2) y t2)

  let rec update i y = function
    | [] -> raise (Invalid_argument "update")
    | Zero :: ts -> Zero :: update i y ts
    | One t :: ts ->
      if i < size t
      then One (update_tree i y t) :: ts
      else One t :: update (i - size t) y ts

Next time

Final post incoming, where I present a formalization of this exact OCaml implementation, using CFML and the time credits-related functions I presented in the first post. Stay tuned!