Gagallium : Typestate in Mezzo? Starting with list iterators.

I (Armaël Guéneau) am currently doing an internship with François Pottier, working on Mezzo, which has been introduced by Jonathan in two previous blog posts (the first, the second).

Since the beginning of my internship, I have been playing with Mezzo, writing some code, and, more specifically, trying to see how the notion of typestate could be expressed with Mezzo’s permissions. As an application, I tried to write in Mezzo an iterator on lists. What I call an iterator is here more like Scala’s Iterator , or a bit like what Gabriel called generators in a previous blog post.

This example turned out to be subtle enough to write in Mezzo: in this post, I’ll try to show you the details of the implementation, leading to a fully working implementation of list iterators. I think it’s a good opportunity to see an implementation of a (very simple) typestate, and also some funny tricks with Mezzo’s permissions.

A word of warning, though: while the theory and implementation of Mezzo are starting to fit in nicely, the library-land is very much unknown territory so far. We are trying new things, and expect them to be easier in the future. As always, practice and teaching will surely yield substantial improvements, leading us to see in retrospect how we could have simplified things. Expect the code examples in this post to look complicated, and probably not representative of the Mezzo code we expect to write in the future.

Briefing

What I want as an iterator is an object that let us iterate on a collection, giving one new element each time we call a function next, that makes the iterator go a step forward. Note that such an iterator is mutable, its internal state being modified by next. It would be possible to consider functional iterators, returning a value corresponding to the next position in the list; but to study the relation with typestate systems we decide to study mutable iterators here.

We have to handle the case where the iterator has no more elements. In Java or Scala, you have to check if there are more elements available with hasNext, and if you call next on an empty iterator, an exception is raised. In Mezzo we don’t have exceptions. Moreover, we want to statically express the protocol that the operations on an iterator must follow, in the types themselves. It’s the idea of typestate. By achieving that, the user is prevented at compilation time of using next on an empty iterator.

The application to (simply linked) lists seems straightforward: you just have to follow the tail link of each Cons cell, starting with the head of the list. What is not so trivial is how to express that with Mezzo’s permissions.

The silent iterator

Let’s start with a very stupid iterator: it traverses the list, but without giving its elements to the user.

From the outside: signatures

An iterator has exclusive access on the list

First, to be able to iterate on a list, the iterator will need the permission to access the list and its contents. A solution is to consume the permission l @ list a when you create an iterator on the list l (of elements of type a), and give it back when the iteration is finished (or when you stop the iterator manually).

This gives us the following signatures for the new and stop functions:

val new: [a] (consumes l: list a) -> iterator a (l @ list a)
val stop: [a, post: perm] (consumes iterator a post) -> (| post)

In case you’re not familiar with Mezzo syntax yet, you can find more details in the first post cited above, but let me just do a quick reminder here. The bracket notation [post:perm] is parametric polymorphism on a type of kind perm (a permission), and that (consumes foo) indicates that type foo is not given back to the type environment after the functional call. Finally, (foo | bar) is a conjunction of the type foo and the permission bar, which may be a purely static information, not associated to any runtime value; in particular, (| post) is an empty tuple that is only useful as the carrier of the permission post.

Expressing iterator typestate

We also need a next function, that takes an iterator in input. To handle the fact that next may lead to an empty iterator, we say that next consumes the fact that the input argument is an iterator, and returns a variant of option a:

data offer (post: perm) a =
| None {| post }
| Some { x: a }

In the first case, the iteration is finished: the post permission (in practice equal to l @ list a for a given a and l) is returned. In the second case, an element is returned. Note that we could have used the sum type of the standard library, choice a b, but this specific datatype allows us to give more explicit constructor names (than Left and Right).

We now have the next signature:

val next: [a, post: perm] (consumes it: iterator a post) ->
  offer post (| it @ iterator a post)

For now, because our iterator is silent, in the Some case, we return no value of type a, only the fact that it is still an iterator, so we can continue the iteration. On the contrary, after a None answer, it is statically not possible to call next again: the permission it @ iterator a post has been consumed and was not returned through the offer.

A small code example using this iterator:

(* Loop calls [next] on the iterator until it is empty *)
val rec loop [a, post: perm] (consumes it: iterator a post): (| post) =
  match next it with
  | None -> ()
  | Some { x } -> loop it
  end

Diving into the internals: implementation

A first attempt

To be able to go forward, the iterator must store the elements that will be explored in the future. With a list, it’s easy: initially, it consists in the list itself, and each call to next just takes the tail of the current internal list.

This gives us:

data mutable iterator a (post: perm) = Iterator {
  xs: list a
}

data offer (post: perm) a =
| None { | post }
| Some { x: a }

val new [a] (consumes l: list a): iterator a (l @ list a) =
  Iterator { xs = l }

val next [a, post: perm] (consumes it: iterator a post):
  offer post (| it @ iterator a post) =
  match it.xs with
  | Nil ->
    None
  | Cons { head; tail } ->
    it.xs <- tail;
    Some { x = () }
  end

Sadly, this example doesn’t typecheck: in the match case where it.xs is Nil, we return None, and the permission post. However, we don’t have post!

Formally, at the beginning of next, the only available permissions are it @ iterator a post, and in the first match case, it.xs @ Nil. Nothing here gives us post.

Intuitively, even if we had post at the beginning, next here doesn’t preserves the knowledge of the cons cells we have already explored: we have to store in the iterator the permissions of the previous cons cells, to be able to finally merge them back into post.

Storing the old permissions

We introduce a new permission, p, that describes the permission for the consumed cons cells. The iterator contains p, and a function, rewind, that consumes p, and the permission on the tail, and merge them into post.

data mutable iterator a (p: perm) (post: perm) = Iterator {
  content: (
    xs: list a,
    rewind: (| consumes (p * xs @ list a)) -> (| post)
    | p
  )
}

With this definition of iterator, the signature of next would be:

val next: [a, p: perm, post: perm] (
  consumes it: Iterator {
    content: (
      xs: list a,
      rewind: (| consumes (p * xs @ list a)) -> (| post)
      | p
    )
  } ->
  offer post (| it @ iterator a (p * xs @ Cons { head: a; tail: unknown }) post)

The idea is that before the call to next, the iterator stores in xs the permission on the non-traversed part of the list, xs, and rewind requests the permission on the already-traversed part of the list, represented by p, upto xs excluded. If xs is itself a cons cell (and only in this case), we can call next; the iterator will then store only the tail of xs, and its rewind function request the permission for p, plus the first cell of xs – which at this point as been traversed.

Concretely, imagine we have the following list construction, for some list lb @ list int.

val la = Cons { head=1; tail=lb }

and are now iterating on this list. Assuming we have already called next once, have traversed the first cell of la, the rewind function of the iterator would have a type equivalent to the following:

rewind: (| consumes (
             la @ Cons { head:int; tail=lb }
           * lb @ list int)
        )
        -> (| post )

If we pattern-match on lb, in the Cons case, the typing environment will learn that lb has type Cons { head : int; tail = lc } for some tail lc @ list int. So rewind has the refined type

rewind: (| consumes (
             la @ Cons { head:int; tail=lb }
           * lb @ Cons { head:int; tail=lc })
        )
        -> (| post )

The already-traversed part of the list, la, has the same type, but the not-yet-traversed part has been refined to a cons type. Note that with the additional hypothesis lc @ int of our context, this is equivalent to the following type:

rewind: (| consumes (
             (la @ Cons { head:int; tail=lb } * lb @ Cons { head:int; tail=lc })
           * lc @ list int
        )
        -> (| post )

which is precisely the type of the rewind function of the iterator returned by next. So after pattern-matching, the type of the rewind function passed to next becomes exactly the same as the type of the rewind function expected as a return value. We can return this function, unchanged: it has just been transtyped.

val next it (* lengthy type annotation that we won't repeat here *) =
  let (xs, rewind) = it.content in
  match xs with
  | Nil ->
    (* p * xs @ list a *)
    rewind ();
    (* post *)
    None
  | Cons { head; tail } ->
    it.content <- (tail, rewind);
    Some
  end

As we described, in the Cons case, the value of the xs field of it is changed to tail, but the rewind field is unchanged.

Remark: we can still shorten this definition by quantifying p existentially in the definition of iterator, and the typechecker will be able to pack and unpack the quantification to do implicitly what we’ve done explicitly previously (the conversion p → p * xs @ Cons { head: a; tail: unknown }).

data mutable iterator a (post: perm) = Iterator {
  content: { p: perm } (
    xs: list a,
    rewind: (| consumes (p * xs @ list a)) -> (| post)
    | p
  )
}

The type for next becomes much more readable. In fact, it is exactly the one we hoped to get at the very beginning of the post.

val next [a, post: perm] (consumes it: iterator a post):
  offer post (| it @ iterator a post)

For the function new, the permission p is the neutral permission empty, and rewind needs to do nothing at all:

val new [a] (consumes l: list a): iterator a (l @ list a) =
  Iterator { content = (
    l,
    fun (| consumes l @ list): (| l @ list a) = ()
  )}

We can also write a stop function that stops the iteration:

val stop [a, post: perm] (consumes (it: iterator a post)): (| post) =
  let _, rewind = it.content in
  rewind ()

Note that the rewind function never does anything; it is just used for its effect on the typing environment.

The chatty (and useful) iterator

This is great, we can traverse a list using our iterator. But it would be even more great if we could actually get the contents of the list!

This is a bit more complicated: while giving an element to the user, we have to give him also the permission on it. This breaks the invariant “the iterator always can have post by applying rewind”. Now, our iterator can have a hole in it: when giving an element to the user, a hole appears. To continue the iteration, the user must give the permission on the element back to the iterator.

Consequently, the definition of iterator changes a bit: an iterator is now also parametrized by a permission hole, which in fact means “what does the iterator need to fill its hole and be able to generate post”.

Here is the new definition of iterator. Note that it doesn’t contains hole, but we need it to generate post:

data mutable iterator a (hole: perm) (post: perm) = Iterator {
  content: { p: perm } (
    xs: list a,
    rewind: (| p * hole * l @ list a) -> (| post)
    | p
  )
}

Thus, an iterator without a hole is an iterator a empty post, while an iterator that has given away x @ a to the user is a iterator a (x @ a) post.

We can now write next. It takes an iterator parametrized by any permission hole, the permission hole itself, and implicitly fills the hole by merging hole into p. It finally returns the next element (if any).

val next [a, hole: perm, post: perm] (consumes (it: iterator a hole post | hole)):
  offer post (x: a | it @ iterator a (x @ a) post) =

  let xs, rewind = it.content in
  match xs with
  | Nil ->
    rewind ();
    None
  | Cons { head; tail } ->
    s.content <- tail, rewind
    Some { x = head }
  end

And we can now use this iterator:

(* [nth] takes an iterator [it] and an integer [n], makes him go forward of [n]
   steps, and then returns it (if it hasn't been consumed) *)
val rec nth [a, hole: perm, post: perm]
  (consumes (it: iterator a hole post | hole), n: int):
  offer (x: a | it @ iterator a (x @ a) post) post =

  match next [hole = hole] it with
  | None ->
    None
  | Some { x } ->
    if n <= 0 then (
      Some { x = x }
    ) else (
      nth [a = a, hole = (x @ a)] (it, n-1)
    )
  end

You can note that we have sometimes to instantiate by hand the polymorphic parameters when calling a function. For example, here, when calling recursively nth, we have to say that a previous call to next has created a hole of “shape” x @ a we want to merge back to continue the iteration.

The cherry on top

So, here it is, an iterator on lists!

However, this needs a little cleaning: we store in our iterator a rewind function, which is the same for every iterator, that doesn’t change over time, and is just present to convert permissions.

A way to clean up a bit is to declare a toplevel identity function, named convert:

val convert (): () = ()
alias convertible (p: perm) (q: perm): perm = convert @ (| consumes p) -> (| q)

data mutable iterator a (hole: perm) (post: perm) = Iterator {
  content: {p: perm} (
    l: list a
    | p * convertible (p * hole * l @ list a) post
  )
}

I find that piece of code cute, and I think it enlightens the way the transtyping of rewind works: if rewind can have type (| consumes p) -> (| q) it’s because this is a subtype of () -> (), which means we have convinced the typechecker that p is convertible into q.

I want the code!

I doubt so, but just in case, the complete code, with some dummy examples of applications, can be found there.