Gagallium : A toy type language (2) variance 101

In this second installment of the “toy type language” series, we will explain how to compute the variance of mutually recursive parameterized types. The actual implementation will be the topic of a following post.

The interest of variance is best seen in combination with a subtyping relation. We therefore assume a type system equipped with a subtyping relation which we write as ≤. In order to give meaningful examples of subtyping, I will use polymorphic variants, so that readers familiar with that feature of OCaml can follow the discussion easily.

The type of values that may only be the constructor A, which OCaml can express as [ `A ], is a subtype of the type made up of two constructors A and B, written [ `A | `B ]. We write this as [ `A ] ≤ [ `A | `B ]: any inhabitant of the smaller type can be seen as an inhabitant of the bigger type. One can think in terms of sets, in which case ≤ is the set inclusion relation ⊂.

Variance is a property of a parameterized type with regards to one of its parameters. If t a is a parameterized type, the variance of t with regards to a will give us information about the kind of subtyping relation that two types t a and t b enjoy, depending on the subtyping relation between a and b.

Variance is especially important for abstract types. The definition of an abstract type is hidden; therefore, unless otherwise specified, no subtyping relations are available. However, we may want to reveal in a module’s signature that a type has a certain variance, so that the type-checker be aware of certain subtyping relations. Therefore, variance is a property that we will sometimes want to export through a module signature.

The four possible variances

Covariance

type 'a list =
  | Nil
  | Cons of 'a * 'a list

If we see list as a function from types to types, then list is nondecreasing: assuming a ≤ b, we have list a ≤ list b¹. We say that list is covariant in its first parameter. If the type list were to be abstract, revealing that it is a covariant type would make sure the type-checker is aware that “a ≤ b ⇒ list a ≤ list b”.

Contravariance

type 'a printer = 'a -> unit

We now turn to the printer type, which is that of a function that can print elements of type 'a. Intuitively, if a function can print elements with constructor A or B, it can also print elements with only constructors A: the function’s B case will just never be called. In other words, we can see a printer [ `A | `B ] as a printer [ `A ], which we write printer [ `A | `B ] ≤ printer [ `A ]. This fact is true even though [ `A ] ≤ [ `A | `B ]. This is the opposite of list: printer is a nonincreasing function from types to types. We say that printer is contravariant in its first parameter.

Invariance

type 'a list_with_printer = 'a printer * 'a list

Here, the list_with_printer type is made up of a covariant type and a contravariant type; list_with_printer is neither co- nor contravariant. We say that it is invariant, meaning the list_with_printer does not enjoy any subtyping relation. This is the default variance: an abstract type is considered to be invariant by default.

Bivariant

type 'a t_bi = ()

A rarely useful variance is bivariance, when a type is both co- and contravariant. This is only possible if the type does not use its type parameter, such as t_bi in the example above. Indeed, we always have both t_bi a ≤ t_bi b and t_bi b ≤ t_bi a for any a and b.

OCaml notation

We now need to reveal the variance of an abstract type through a module signature, so that the type-checker can take advantage of this extra information. OCaml offers notation in module signatures for all variances, except bivariance.

module M: sig
  type +'a co_t
  type -'a contra_t
  type 'a inv_t
end = struct
  type 'a co_t = 'a list
  type 'a contra_t = 'a printer
  type 'a inv_t = 'a printer * 'a list
end

The variance lattice

Variance forms a complete lattice, which induces a “meet” (“goes down”) and a “join” (“goes up”) operation, which we respectively write as ∩ and ∪.

       inv
      /   \
    co   contra
      \   /
        bi

Computing variance

We now turn to the actual process of computing variance. Let’s assume first that a recursive procedure is enough to compute the variance of a given data type.

First example

Consider the list_with_printer type. When trying to compute variance(list_with_printer, a), we need to compute variance(list, a) and variance(printer, a). Assuming the variances for list and printer have been computed already, we obtain co and contra. Then, the variance of list_with_printer is one that subsumes both the variance of list and printer, as the two appear in the definition of list_with_printer. Finding a variance that is greater than both co and contra corresponds to a “join” operation on the lattice.

In short,

variance(list_with_printer, a) = variance(list, a) ∪ variance(printer, a)
                               = co ∪ contra
                               = inv

Recursion

For the example above, a simple recursive computation may suffice. However, in the presence of recursion, computing the variance of list requires that one knows the variance of list already, as list occurs in its own definition!

The variance for list can be computed using a least fixed point: we initially assume the variance of list to be the “best” possible (bivariant). Under this hypothesis, we compute the variance of list, which turns out to be co.

We then perform a second iteration: assuming list is now covariant, the computed variance for list is still co: we have reached a fixed point. Starting from the bottom of the lattice guarantees we reach the least fixed point: indeed, invariant is a valid fixed point for list, but not the best (least) one.

We can sum up this procedure as follows, where the fixed point is not reached until the second line.

variance(list, a): bi ⊢ variance (list, a) = co
variance(list, a): co ⊢ variance (list, a) = co

Systems of equations

We can express variance using a system of equations. Back to the list_with_printer example, if v is the variance associated to the parameter a of list_with_printer, we write:

v = variance(list_with_printer, a)

How do we compute the variance? We look up the definition of list_with_printer, and figure out that this type is defined to be a tuple. Computing variance for a tuple is easy; we just apply the “join” operation.

(* [variance t a] computes the variance of parameter [a] in type [t] *)
variance :: type -> parameter -> variance
variance((t, u), a) = variance (t, a) ∪ variance (u, a)

This reduces to:

v = co ∪ contra

Solving this equation is straightforward and the fixed-point computation is trivial: inv is the least fixed point of that equation, therefore, list_with_printer is invariant over its first parameter.

Negation

We have shown the definition of the variance function in the case of tuples. Another easy case is that of a variable:

variance(a, a) = co
variance(b, a) = bi

The variance of a in a itself is covariant; the variance of a in another, distinct type variable is bi, as a does not appear in type b.

Arrow types are __contra__variant in their domain: if a appears in covariant position in t, a is __contra__variant in t -> unit. We therefore introduce a negation operator, which we write -, defined as follows:

-inv    = inv
-co     = contra
-contra = co
-bi     = bi

variance(t -> u, a) = - variance(t, a) ∪ variance (u, a)

Composition

We still haven’t defined the variance function in the case of type applications, which we write t u. Computing the variance of a in t u requires composing the variance of t in its first parameter and the variance of a in u.

Still seeing data types as functions, this amounts to computing the monotonicity of the composition of two functions.

If v is the variance of t in its first parameter, and w is the variance of a in u, then we define variance as follows:

variance(t u, a) = var_for_param(t, 0) . variance(u, a)

We compose the variance of t in its first parameter with the variance of a in u.

How should we define the dot operator? Let us take some examples. If variance(u, a) = w, then variance(list u, a) remains w: we say that co.w = w. Conversely, variance(printer u, a) is -w: we have contra.w = -w. An interesting case if that of bivariance: if a does not appear in u, then a won’t appear in t u, meaning v.bi = bi. Symmetrically, if t does not use its parameter u, then a won’t appear in t u, meaning bi.w = bi. Finally, composing an invariant type with any other type yields an invariant type.

The composition of v and w, which we write as v.w, is summed up in the following table.

`v.w`	inv	co	contra	bi
inv	inv	inv	inv	bi
co	inv	co	contra	bi
contra	inv	contra	co	bi
bi	bi	bi	bi	bi
`v`

This allows us to write the system of equations for the list type:

v = (co ∪ v.co) ∪ bi

Mutually recursive data types

The fixed-point computation becomes much more interesting in the case of multiple, mutually recursive data types. Consider the following definition:

type ('a, 'b) t =
  | Foo of ('a -> 'b) 
  | Bar of ('a * 'b, 'b) u
and ('c, 'd) u =
  | Baz of ('c, 'c) t

The system of equations needs four variables, va, vb, vc and vd which are the variances of all four type parameters.

va = -co ∪ bi ∪ vc.co ∪ vd.bi
vb = -bi ∪ co ∪ vc.co ∪ vd.co
vc = va.co ∪ vb.co
vd = va.bi ∪ vb.bi

We have now explained how to write down the recursive equations that allow one to compute the variance of mutually recursive data types. The next and final blog post will show how to compute the least fixed point for any set of equations, using the Fix library.