## A toy type language (2) variance 101

- January 17, 2014
- Last updated on 2014/01/17

In this second installment of the “toy type language” series, we will explain how to compute the variance of mutually recursive parameterized types. The actual implementation will be the topic of a following post.

The interest of variance is best seen in combination with a subtyping
relation. We therefore assume a type system equipped with a subtyping
relation which we write as `≤`

. In order to give meaningful
examples of subtyping, I will use *polymorphic variants*, so that
readers familiar with that feature of OCaml can follow the discussion
easily.

The type of values that may only be the constructor `A`

,
which OCaml can express as `[ `A ]`

, is a subtype of the type
made up of two constructors `A`

and `B`

, written
`[ `A | `B ]`

. We write this as
`[ `A ] ≤ [ `A | `B ]`

: any inhabitant of the smaller type
can be seen as an inhabitant of the bigger type. One can think in terms
of sets, in which case `≤`

is the set inclusion relation
`⊂`

.

Variance is a property of a parameterized type with regards to one of
its parameters. If `t a`

is a parameterized type, the
variance of `t`

with regards to `a`

will give us
information about the kind of subtyping relation that two types
`t a`

and `t b`

enjoy, depending on the subtyping
relation between `a`

and `b`

.

Variance is especially important for *abstract* types. The
definition of an abstract type is hidden; therefore, unless otherwise
specified, no subtyping relations are available. However, we may want to
reveal in a module’s signature that a type has a certain variance, so
that the type-checker be aware of certain subtyping relations.
Therefore, variance is a property that we will sometimes want to export
through a module signature.

### The four possible variances

#### Covariance

```
type 'a list =
| Nil
| Cons of 'a * 'a list
```

If we see `list`

as a function from types to types, then
`list`

is nondecreasing: assuming `a ≤ b`

, we have
`list a ≤ list b`

^{1}. We say that
`list`

is **covariant** in its first parameter.
If the type `list`

were to be abstract, revealing that it is
a covariant type would make sure the type-checker is aware that
“`a ≤ b ⇒ list a ≤ list b`

”.

#### Contravariance

```
type 'a printer = 'a -> unit
```

We now turn to the `printer`

type, which is that of a
function that can print elements of type `'a`

. Intuitively,
if a function can print elements with constructor `A`

or
`B`

, it can also print elements with only constructors
`A`

: the function’s `B`

case will just never be
called. In other words, we can see a `printer [ `A | `B ]`

as
a `printer [ `A ]`

, which we write
`printer [ `A | `B ] ≤ printer [ `A ]`

. This fact is true
even though `[ `A ] ≤ [ `A | `B ]`

. This is the opposite of
`list`

: `printer`

is a nonincreasing function from
types to types. We say that `printer`

is
**contravariant** in its first parameter.

#### Invariance

```
type 'a list_with_printer = 'a printer * 'a list
```

Here, the `list_with_printer`

type is made up of a
covariant type and a contravariant type; `list_with_printer`

is neither co- nor contravariant. We say that it is
**invariant**, meaning the `list_with_printer`

does not enjoy any subtyping relation. This is the default variance: an
abstract type is considered to be invariant by default.

#### Bivariant

```
type 'a t_bi = ()
```

A rarely useful variance is **bivariance**, when a type
is **both** co- and contravariant. This is only possible if
the type does not use its type parameter, such as `t_bi`

in
the example above. Indeed, we always have both
`t_bi a ≤ t_bi b`

**and**
`t_bi b ≤ t_bi a`

for any `a`

and
`b`

.

### OCaml notation

We now need to reveal the variance of an abstract type through a module signature, so that the type-checker can take advantage of this extra information. OCaml offers notation in module signatures for all variances, except bivariance.

```
module M: sig
type +'a co_t
type -'a contra_t
type 'a inv_t
end = struct
type 'a co_t = 'a list
type 'a contra_t = 'a printer
type 'a inv_t = 'a printer * 'a list
end
```

### The variance lattice

Variance forms a complete lattice, which induces a “meet” (“goes
down”) and a “join” (“goes up”) operation, which we respectively write
as `∩`

and `∪`

.

```
inv
/ \
co contra
\ /
bi
```

## Computing variance

We now turn to the actual process of computing variance. Let’s assume first that a recursive procedure is enough to compute the variance of a given data type.

### First example

Consider the `list_with_printer`

type. When trying to
compute `variance(list_with_printer, a)`

, we need to compute
`variance(list, a)`

and `variance(printer, a)`

.
Assuming the variances for `list`

and `printer`

have been computed already, we obtain `co`

and
`contra`

. Then, the variance of
`list_with_printer`

is one that *subsumes* both the
variance of `list`

and `printer`

, as the two
appear in the definition of `list_with_printer`

. Finding a
variance that is greater than both `co`

and
`contra`

corresponds to a “join” operation on the
lattice.

In short,

```
variance(list_with_printer, a) = variance(list, a) ∪ variance(printer, a)
= co ∪ contra
= inv
```

### Recursion

For the example above, a simple recursive computation may suffice.
However, in the presence of recursion, computing the variance of
`list`

requires that one knows the variance of
`list`

already, as `list`

occurs in its own
definition!

The variance for `list`

can be computed using a
**least fixed point**: we initially assume the variance of
`list`

to be the “best” possible (bivariant). Under this
hypothesis, we compute the variance of `list`

, which turns
out to be `co`

.

We then perform a second iteration: assuming `list`

is now
covariant, the computed variance for `list`

is still
`co`

: we have reached a **fixed point**.
Starting from the bottom of the lattice guarantees we reach the
**least** fixed point: indeed, invariant is a valid fixed
point for `list`

, but not the best (least) one.

We can sum up this procedure as follows, where the fixed point is not reached until the second line.

```
variance(list, a): bi ⊢ variance (list, a) = co
variance(list, a): co ⊢ variance (list, a) = co
```

### Systems of equations

We can express variance using a system of equations. Back to the
`list_with_printer`

example, if `v`

is the
variance associated to the parameter `a`

of
`list_with_printer`

, we write:

```
v = variance(list_with_printer, a)
```

How do we compute the variance? We look up the definition of
`list_with_printer`

, and figure out that this type is defined
to be a tuple. Computing variance for a tuple is easy; we just apply the
“join” operation.

```
(* [variance t a] computes the variance of parameter [a] in type [t] *)
variance :: type -> parameter -> variance
variance((t, u), a) = variance (t, a) ∪ variance (u, a)
```

This reduces to:

```
v = co ∪ contra
```

Solving this equation is straightforward and the fixed-point
computation is trivial: `inv`

is the least fixed point of
that equation, therefore, `list_with_printer`

is invariant
over its first parameter.

### Negation

We have shown the definition of the `variance`

function in
the case of tuples. Another easy case is that of a variable:

```
variance(a, a) = co
variance(b, a) = bi
```

The variance of `a`

in `a`

itself is covariant;
the variance of `a`

in another, distinct type variable is
`bi`

, as `a`

does not appear in type
`b`

.

Arrow types are __contra__variant in their domain: if `a`

appears in covariant position in `t`

, `a`

is
__contra__variant in `t -> unit`

. We therefore introduce a
negation operator, which we write `-`

, defined as
follows:

```
-inv = inv
-co = contra
-contra = co
-bi = bi
```

```
variance(t -> u, a) = - variance(t, a) ∪ variance (u, a)
```

### Composition

We still haven’t defined the `variance`

function in the
case of type applications, which we write `t u`

. Computing
the variance of `a`

in `t u`

requires
**composing** the variance of `t`

in its first
parameter and the variance of `a`

in `u`

.

Still seeing data types as functions, this amounts to computing the monotonicity of the composition of two functions.

If `v`

is the variance of `t`

in its first
parameter, and `w`

is the variance of `a`

in
`u`

, then we define `variance`

as follows:

```
variance(t u, a) = var_for_param(t, 0) . variance(u, a)
```

We **compose** the variance of `t`

in its
first parameter with the variance of `a`

in
`u`

.

How should we define the dot operator? Let us take some examples. If
`variance(u, a) = w`

, then `variance(list u, a)`

remains `w`

: we say that `co.w = w`

. Conversely,
`variance(printer u, a)`

is `-w`

: we have
`contra.w = -w`

. An interesting case if that of bivariance:
if `a`

does not appear in `u`

, then `a`

won’t appear in `t u`

, meaning `v.bi = bi`

.
Symmetrically, if `t`

does not use its parameter
`u`

, then `a`

won’t appear in `t u`

,
meaning `bi.w = bi`

. Finally, composing an invariant type
with any other type yields an invariant type.

The composition of `v`

and `w`

, which we write
as `v.w`

, is summed up in the following table.

`v.w` |
inv | co | contra | bi | `w` |
---|---|---|---|---|---|

inv | inv | inv | inv | bi | |

co | inv | co | contra | bi | |

contra | inv | contra | co | bi | |

bi | bi | bi | bi | bi | |

`v` |

This allows us to write the system of equations for the
`list`

type:

```
v = (co ∪ v.co) ∪ bi
```

### Mutually recursive data types

The fixed-point computation becomes much more interesting in the case of multiple, mutually recursive data types. Consider the following definition:

```
type ('a, 'b) t =
| Foo of ('a -> 'b)
| Bar of ('a * 'b, 'b) u
and ('c, 'd) u =
| Baz of ('c, 'c) t
```

The system of equations needs four variables, `va`

,
`vb`

, `vc`

and `vd`

which are the
variances of all four type parameters.

```
va = -co ∪ bi ∪ vc.co ∪ vd.bi
vb = -bi ∪ co ∪ vc.co ∪ vd.co
vc = va.co ∪ vb.co
vd = va.bi ∪ vb.bi
```

.

We have now explained how to write down the recursive equations that
allow one to compute the variance of mutually recursive data types. The
next and final blog post will show how to compute the least fixed point
for any set of equations, using the `Fix`

library.

I am intentionally

*not*using the OCaml notation here, but the more conventional one usually found in papers, where data types are treated just like functions applied to arguments.↩︎