I've been investigating the issue further, searching the relevant
literature for previous work on GADTs+subtyping.

This situation is indeed described in the work by Vincent Simonet and
François Pottier, "A Constraint-Based Approach to Guarded Algebraic
Data Types", 2007.

  http://www.normalesup.org/~simonet/publis/index.en.html#simonet-pottier-hmg-toplas

They describe a generic system HMG(X) that adds GADTs on top of an
existing type system optionally supporting subtyping -- in the spirit
of HM(X), a generic study of adding Hindley/Milner parametric
polymorphism to a language. When X is instantiated with (an idealized
view of) the OCaml type system, their work can be used to get
soundness proof for the current GADT implementation.


# The soundness requirement from Simonet & Pottier

Simonet and Pottier prove type soundness of a language with subtyping
extended with GADT under general hypotheses that the underlying time
system must verify. There are three requirements, two saying that
function types must have the correct variance and must not be in
subtyping relation with a GADT type, and one that is relevant to the
present discussion, that says in essence that an "inversion" principle
must hold for all constructors taken independently: for each GADT (ᾱ
t) (ᾱ denotes a sequence of variables α₁,α₂...αn) and each of its
constructor (K : σ(ᾱ)), it must holds in the underlying type system
that (ᾱ t ≤ ᾱ' t) implies (σ(ᾱ) ≤ σ(ᾱ').

More precisely, a GADT constructor K for ᾱ t has a type scheme of the
form
  K : ∀ᾱ∀β̄[D̄] τ̄ → ᾱ t
or, equivalently,
  K : ∀ᾱ (∃β̄[D̄] τ̄) → ᾱ t

The ᾱ are the variables of the parametrized type (ᾱ t), the β̄ are the
existential types for this constructor K, used in the types τ̄ of the
constructor parameter, and D̄ is a set of constraints on those ᾱ and β̄,
equality constraints as with current-ocaml-GADTs or possibly other
forms of constraints, such as subtyping constraints – with the
assumption D is always satisfiable, other K would be a dead
constructor. The "product" GADT in my previous mail would be written:

  type α expr
    Val : ∀α α → α expr
    Prod: ∀α∀β₁β₂[α = β₁*β₂] β₁ expr * β₂ expr → α expr

Formally, the statement of Simonet and Pottier (page 19) says that if
∀ᾱ∀β̄[D̄] τ̄ → ᾱ t and ∀ᾱ'∀β̄'[D̄'] τ̄' → ᾱ t are two alpha-renamed copies of
the type of a single constructor K (I mean ᾱ,β̄ are distinct from ᾱ', β̄')
then the following constraint entailment holds:

  D̄ ∧ (ᾱ t ≤ ᾱ' t) ⊩ ∃β̄'. (D̄' ∧ τ̄ ≤ τ̄')

For example, the (Prod) constructor of the (α expr) type gives rise to
the following requirement:

  α = β₁*β₂ ∧ (α expr ≤ α' expr) ⊩
    ∃β'₁β'₂, (α' = β'₁*β'₂) ∧ (β₁ t ≤ β'₁ t) ∧ (β₂ t ≤ β'₂ t)

I have applied this requirement to the various previously discussed
examples and counter-examples of GADTs parameter variance. If the
requirement holds given some variance annotations, then we know for
sure that the annotations are correct. If the variance annotations
contradict the requirements, then they are incorrect.

The study of examples allowed to formulate a general case criterion as
refinement of Simonet&Pottier requirement in the equality case. The
criterion, however, depends on properties of OCaml's subtyping
relation that is strongly affected, in particular, by the presence of
`private` types. This raises interesting questions on the type system
design and its potential evolution, with several points that would be
worthy of further exploration.

(To lighten the present mail I have separated some technical arguments
to a joint document that you should feel free to skip.)


# A definition of parameter variance

My definition of variance is as follows. The parameters α₁,α₂...an of
a parametrized type ᾱ t can be each assigned a variance annotation
among {+,-,=} (covariance, contravariant and invariant) with the
following meaning:

  (ᾱ t ≤ ᾱ' t) holds *if and only if*, for each variable αi ∈ ᾱ:
    - if it is annotated with +, αi ≤ α'i
    - if it is annotated with -, αi ≥ α'i
    - if it is annotated with =, αi = α'i

We will write (≤i) the order corresponding to the variance of the
parameter αi: (≤), (≥) or (=) if it is respectively (+), (-) or
(=). We can then write the variance fact for the type ᾱ t as:
  (ᾱ t ≤ ᾱ' t)  iff  (∀i, αi ≤i α'i)

## Simple admissible criteria

There are two simple facts to note:
- choosing invariance by default is always safe
- the (more permissive) current OCaml choice for GADTs is safe

If all parameters of a given GADT (ᾱ t) are considered invariant, then
the requirement of Simonet and Pottier holds for all constructors of
this GADT, which means that this choice preserves soundness.

Indeed, given ᾱ, ᾱ' and β̄ we need to prove that:
  ᾱ t ≤ ᾱ t ∧ D̄ ⊩ ∃β̄'. (D̄' ∧ τ̄ ≤ τ̄')

By invariance, we have that for all i, αi = α'i, that is ᾱ = ᾱ'. Then
taking β̄':=β̄ we have D̄=D̄' and τ̄=τ̄', so indeed ᾱ=ᾱ ∧ D̄ ⊩ D̄' ∧ τ̄≤τ̄'

The current OCaml choice is to use the usual variance determination
rules for parameters αi that "are not instantiated", which means that
there is no constraint on αi in D, and mark the other invariant. But
then we have, as in the previous case, D̄=D̄', and τ̄≤τ̄' is guaranteed by
the usual variance check on the non-instantiated αi.


# Simple examples and counter-examples

It is now interesting to have a look at the examples and
counter-examples that were given in the discussion.

## My earlier 'Prod' example

  type +α expr
    Val : ∀α α → α expr
    Prod: ∀α∀β₁β₂[α = β₁*β₂] β₁ expr * β₂ expr → α expr

The type of the 'Val' constructor, which is not instantiated, falls
under the previous criterion: the existing covariance check suffices
to prove the requirement of Simonet&Pottier. For the Prod constructor,
the requirement was:
  α = β₁*β₂ ∧ (α expr ≤ α' expr) ⊩
    ∃β'₁β'₂, (α' = β'₁*β'₂) ∧ (β₁ expr ≤ β'₁ expr) ∧ (β₂ expr ≤ β'₂ expr)

Under covariance assumption (α ≤ α' iff α expr ≤ α' expr) this is
equivalent to:

  α = β*β₂ ∧ α ≤ α' ⊩
    ∃β'₁β'₂, (α' = β'₁*β'₂) ∧ (β₁ ≤ β'₁) ∧ (β₂ ≤ β'₂)

But we "know" (this is an assumption I make about the OCaml
type system) that the following "inversion principle" holds on
tuples: β*β₂ ≤ α' implies that α' is of the form β´₁*β´₂
with β₁≤β´₁, β₂≤β´₂. Hence our requirement is met.

## Jeremy Yallop's 'eq' counter-example

   type (=α,+β) eq
     Refl : ∀αβ∀γ[α=γ,β=γ] (α,β) eq

The requirement derived from this definition is:
  α=γ ∧ β=γ ∧ (α,β) eq ≤ (α',β') eq ⊩ ∃γ', α'=γ' ∧ β'=γ'
Which is equivalent, by the assumed variance annotations, to
  α=γ ∧ β=γ ∧ α=α' ∧ β≤β' ⊩ ∃γ', α'=γ' ∧ β'=γ'
that is
  γ=α' ∧ γ≤β' ⊩ β'=γ
which is false in a model with at least one strict subtyping relation.

Note that (+α,+β) (or any other annotation different from (=,=)) would
also fail, as α≤α', β≤β' and α=β do not entail α'=β': intuitively,
variant parameters must be free to vary "independently".

# The constant case: Upward-closed, downward-closed

Jacques private example is equally interesting:

  type priv_int = private int

  type +α expr
    Priv : ∀α[α=priv_int]. α → α expr

The requirement is:
  α = priv_int ∧ α expr ≤ α' expr ⊩ α' = priv_int ∧ α ≤ α'

Which is equivalent under the covariance assumption to:
  α = priv_int ∧ α ≤ α' ⊩ α' = priv_int

But this is false: for α' := int we have α≤α' yet α' ≠ priv_int.

As the requirement is not met, we have no proof of the soundness of
the system and should reject this choice of variance annotation.

## Covariance and upward-closed types

Note that the above criterion would be true of any "constant
constructor": for a base type `c` not parametrized by α, the
constructor:

  K : ∀α[α=c]. α → α expr

gives birth, under a covariance assumption, to the requirement:

  α = c ∧ α ≤ α' ⊩ α' = c

This requirement exactly says that c is 'upward-closed': all types
bigger than c are equal to c. This is wrong for the type 'priv_int'
above, and probably polymorphic variant types, but true for a lot of
OCaml types: int, int list, etc.

There is something strange happening, where we start doing semantic
distinctions on types that we didn't have to do before; now we care
whether a type is a private type or not, because it affects the
variance of GADTs constructors using it for parameter instantiation.

## There are no downward-closed types

Of course the same phenomenon happens in contravariant position:
constant types instantiating contravariant positions should be
downward-closed. But this is not possible in OCaml type theory, where
no type (except ⊥) is downward-closed because of the pervasive
presence of private types: morally, (private σ) is always strictly
smaller than σ (even if σ is itself a private type). This means that
accepting any constant instance in contravariant position, eg.
  type -α t
    K : ∀α[α=int] (α → int) → α t
would be unsound: eliminating (K (fun n -> n) : priv_int t) would
allow to forge private integers.


# Constructors cases: general inversion

In the general case we have non-constant constraints of the form
(αi = Ti(β̄)), where Ti(β̄) is a type expression having as free
variables a subset of the existential variables of the constructor.
  D̄ ∧ (ᾱ t ≤ ᾱ' t) ⊩ ∃β̄'. (D̄' ∧ τ̄ ≤ τ̄')
which is equivalent, assuming some variance on the ᾱ, to
  D̄ ∧ (∀j, αj ≤j α'j) ⊩ ∃β̄'. (D̄' ∧ τ̄ ≤ τ̄')

Let's assume that, for a given αi, the existential variables appearing
in Ti(β̄) are not used in any other constraint (we will discuss this
requirement later). We can therefore study independently the necessary
sub-requirement
  (αi = Ti(β̄)) ∧ αi ≤i α'i  ⊩ ∃β̄' (α'i = Ti(β̄') ∧ τ̄ ≤ τ̄[β̄'/β̄])
which is equivalent, substituting equalities, to
  Ti(β̄) ≤i α'i ⊢ ∃β̄' (α'i = Ti(β̄') ∧ τ̄ ≤ τ̄[β̄'/β̄])

This is an "eliminability" requirement on type expressions Ti. In the
covariant case for example, it states that all types bigger than Ti(β̄)
are also of the form Ti(β̄') for some β̄'. The requirement further asks
that the variance of the β̄ in Ti coincides with their variance in the
parameter types τ̄ (this is what τ̄ ≤ τ̄[β̄'/β̄] means).

This extends the notion of upward/downward closure for constant
types. A parametrized type (ᾱ t) is "upward closed" if all
types τ such that (ᾱ t ≤ τ) are of the form (ᾱ' t):

  ᾱ t ≤ τ ⊩ ∃ᾱ'(τ = ᾱ' t)

(Combined with the variance information on t, we also know that, for
all satisfying ᾱ', we have (∀i, αi ≤i α'i)).

Again, all the usual OCaml parametrized datatypes are upward closed,
private types are not, and no type is downward closed because of the
presence of private types.

A type expression T(β̄) with head constructor (ᾱ t) is upward closed if
its subexpressions appearing in covariant positions (for t) are upward
closed, and those in contravariant positions are downward closed.


# Existential variable interference

The reasoning in the constructor case was done under the assumptions
that the set of existential variables β̄ appearing in each Ti(β̄) were
distinct. I claim that it is both necessary and sufficient (to satisfy
the requirement) to assume that eventual shared variables are
invariant with respect to the whole GADT type. If one writes the
constructor type in OCaml syntax:

  K : τ̄ → (T₁(β̄),T₂(β̄)...) t

then, if a given β appears in several Ti,Tj, each of its occurrence in
the type expression (T₁(β̄),T₂(β̄)...) t must be in invariant
position. It is easy to show that this ensures soundness, but I found
it harder to get a (draft of) proof that it is also required.

It may be interesting to see examples of interference. In Jeremy
Yallop's equality example
  Refl : ∀β. (β,β) eq
that is
  Refl : ∀α₁α₂∀β[α₁=β,α₂=β] (α₁,α₂) eq
both constraints are covariant in α₁, so the corresponding parameters
must both be invariant: only (=α₁,=α₂) eq is a correct variance
annotation.

A (convoluted) variation of this example is
 type (+α₁,=α₂) refpair
  RefPair : ∀βγ. β * γ ref → (γ ref, β * γ)
or equivalently, in constraint form
  RefPair : ∀α₁α₂∀βγ[α₁ = γ ref, α₂ = γ * β]. β * γ ref → (α₁, α₂) refpair

This case is safe: while the variable γ is shared, it is only used
invariantly in (γ ref, β * γ) t as it is instantiated invariantly in
the first parameter, and the second one is invariant. If
(γ ref, γ * β) refpair ≤ (α'₁, α'₂) refpair, then we have
(γ ref ≤ α'₁), which means (by upward closure and then invariance
of 'ref') that α'₁ = γ ref, and α'₂ = γ * β (by invariance).


# A criterion in the general case

I believe (and have a draft of proof in the attached technical annex)
that the requirement of Simonet&Pottier is equivalent, in the equality
case, to the following criterion that *could* be implemented in
a type-checker – more easily than by adding a general constraint
solver.

  Given variance annotations (≤i) for the type ᾱ t, the type
  constructor
    K : ∀ᾱ∀β̄[α₁ = T₁(β̄), α₂ = T₂(β̄)...] τ̄ → ᾱ t
  or equivalently (in OCaml syntax)
    K : τ̄ → (T₁(β̄),T₂(β̄)...) t
  respects the variance if all the conditions below are met:

  1. Non-interference: if an existential variable β appears in several
     Ti(β̄), then all its occurences in the whole type expression
     (T₁(β̄),T₂(β̄)...) t must be in invariant position.

  2. Upward/downward closure, or eliminability: for each i, β̄, α', if
     (Ti(β̄) ≤i α') holds then α' is of the form Ti(β̄').

  3. Constructor parameters variance: the variance of the β̄ in the
     type expression (T₁(β̄),T₂(β̄)...) t must respect their variance in
     the constructor parameters τ̄ – just like if we had a non-GADT
     declaration in term of the β̄, (type β̄ t = K of τ̄ | ...).

I believe this is sound, not unreasonably difficult to check, and
would help in a lot of concrete cases that are currently rejected.


# Discussion of private types and contravariance

There is something quite disturbing in the idea that the mere
existence of private types creates a strong asymmetry between
covariant and contravariant parameters for GADTs: while upward-closed
types are relatively common (so that we can expect "the usual
examples" of GADTs to have "the expected variance"), there are no more
downward-closed types, which means no contravariant constrained
parameter.

Upward or downward closure can be easily decided on fully transparent
types (types whose definition is known). But there is a problem with
opaque types: as we don't know how they were defined, we can't check
if they are upward/downward closed.

From a design point of view, the "right thing" would be to let people
publish this information in the signature for an opaque type. Just as
we can annotate the type with variance information, we should be able
to publish its upward/downward closure properties.

Another natural question is: what if we added the symmetric of the
"private" concept, a type (blind σ) strictly above σ for all types?
Would we lose all reasonable variance properties on OCaml types?

One idea from Didier Rémy is to allow, when defining a new type 't'
(not a type synonym), to define it as "non-privatizable"
(and correspondingly non-blindable). We or another programmer would
then not be allowed to define a type 'private t'. In exchange for this
lost flexibility, 't' would be downward-closed. The idea is that
introducing private types increased our freedom, and reciprocally
lessened our ability to reason about "what cannot be done";
non-privatizable annotations allow to locally make the inverse choice
of losing 'private' for better subtyping properties.

(Of course, I am not requesting short-term language changes to add
those features.)

From a pragmatic standpoint, I suspect the current restrictions
'private' entails on the GADT variance checks are not going to be
really problematic in practice. We tend to use covariant types
(data) more often than contravariant types (computations). The OCaml
type system is already skewed towards covariance in one place: while
the relaxed value restriction (which, I recall, is the reason why
I needed GADT variance in the first place) would work equally well
to generalize variables that appear only in covariant or only in
contravariant positions, the relaxation is implemented in the
type-checker only for covariant parameters. And I have not yet heard
of someone requesting generalization of only-contravariant
parameters, so apparently treating contravariant parameters as
second-class citizens is ok among OCaml users.


# A theoretical open-world approach

Some people may consider the upward-downward criterion as
fundamentally unsatisfying because of the closed-world assumption it
makes: it is not preserved by enrichment of the OCaml subtyping
lattice.

There is another possibility to get some variance properties out of
GADTs that still works in an open-world system: allow GADTs
constraints to have subtyping constraints, instead of only equality
constraint.

For example we could define the type of well-typed expressions in
this way (note the ≥ in constraints):
  type +α expr =
    | Val : ∀α. α → α expr
    | Prod : ∀α∀βγ[α ≥ β*γ] β*γ → α expr
    | Priv : ∀α[α ≥ priv_int] priv_int → α expr

It is trivial that this definition is covariant in its type parameter,
because covariance was baked in the definition. Indeed, if α≤α'
and α≥β*γ, then α'≥β*γ by transitivity: the constraint is preserved by
going to a larger type. In the soundness requirement of
Simonet&Pottier, one can use the exact same existential variables on
both sides (β̄':=β̄), and the constraints will be satisfied.

The downside of this easier covariance property is that we get weaker
types on deconstruction (pattern matching of GADT values): when
matching on a (α t), in the Prod clause, we only know that (α ≥ β*γ)
rather than (α = β*γ) – note that we could regain the stronger
hypothesis by locally using the closed-world assumption.

This simple idea has two potential problems:

1. This may not be implementable as part of the current OCaml
   type-checker, which uses type equalities internally and has more
   limited subtyping support (in particular we can't abstract over
   arbitrary subtyping assumptions). That said, this particular way of
   adding subtyping hypothesis in the context might be simple enough
   for it to handle.

2. Getting a weaker hypothesis may be unusable: maybe the usual GADT
   programs really make use of the equality assumptions, and going for
   subtyping assumptions would make untypable programs that would be
   perfectly fine under the closed-world assumption.


Jacques can tell us about point 1. Regarding point 2., I'm not sure
and have not thoroughly checked examples, but my intuition is that it
is actually not a problem. I think that the types we naturally want to be
covariant are also the types we want to manipulate as data (so it's ok
if the type is larger than what we expect), rather than consumers, and
conversely.

For example, the idiomatic use of the (+α expr) datatype is the
(eval : ∀α. α expr → α) function, and it can be written on this
monotonous version.

let rec eval : ∀α. α expr → α = function
  | Val (v : α)                        → (v : α)
  | Prod ∃βγ[α≥β*γ] (b:β, c:γ)        → ((b, c) : β*γ :> α)
  | Priv [α ≥ priv_int] (n : priv_int) → (n : priv_int :> α)

One would need to study more examples to get a better understanding of
this question.


# Conclusion and questions

We have a relaxed criterion that we're confident is sound and can
plausibly be implemented in OCaml (including possibly before the
next release). But it relies on a closed-world assumption that is
currently skewed towards covariance because of private types -- not
a problem for our use case, the relaxed value restriction. Finally, it
would suggest some language extensions to benefit fully from it, by
enabling upward/downward closure specification for abstract types, and
allowing to forbid making a new type `private` in the future
(or blind) to gain stronger variance properties.

On the other hand, we know that this criterion would not hold as is in
an open world scenario. Is this consideration relevant to the OCaml
type system? If one wants to keep the door open to the open world
case, would the suggested approach (reasoning under subtyping
constraints in pattern clauses) be implementable?

I don't believe the two choices are exclusive: if we have an
annotation to say "this type will never be marked private", it becomes
downward-closed even in an open-world scenario, and we can recover the
current behavior by saying that all types, by default, "will never be
marked blind" but "may be marked private". Those defaults are design
areas to explore.

I would welcome feedback on the current state of affairs. Given what
we know, is there a hope of relaxing the variance check for GADT at
some short-to-medium-term point?