Gagallium : Rethinking OCaml abstract signatures

Abstract module types are one of the less understood features of the OCaml module system. They have been one of the obstacles in the on-going effort to specify, and eventually redesign, the module system.

In this blog post, I (Clément Blaudeau) present an explanation of what are those abstract module types, and propose a slightly restricted version that might be easier to understand and specify while remaining pretty expressive.

For the past 2 years, I’ve been working on building a new specification for the OCaml module system based on Fω. The goal, besides the theoretical interest, is to eventually redo (the module part of) the typechecker with this approach, which would have several benefits:

fix some soundness issues and edge-cases that have appeared and built up over the years, due to unforeseen interactions between features
simplify the (notoriously hard) code of the typechecker by removing ad-hoc techniques and hacks (such as the strengthening or the treatment of aliases for instance)
provide a clean base to add new and awaited features. Notably, transparent ascription and modular implicits are proposals for OCaml modules stalled by the lack of specification of the module system.

Yet, a key aspect in OCaml development culture is to ensure backward compatibility. Therefore, the new Fω approach I’ve been building should not only subsumes the current typechecker in normal use cases, but actually support all of the features of the module system. For long, abstract signatures (also called abstract module types) were believed to be, at least, problematic for Fω. Hopefully, we found out that a slightly restricted version of the feature was encodable in Fω, and, in passing, made the semantics of abstract signatures much simpler. Thus, only one question remains: does this restricted form actually covers all use cases, i.e., is the restriction backward compatible ?

Here, we aim at presenting the current state of abstract signatures and our proposed simplification purely from an OCaml user point of view, not from the theoretical one. We welcome any feedback, specifically, use cases or potential use cases that significantly differ from our examples.

We start by introducing abstract signatures through examples. Then, we present the current state of abstract signatures in OCaml: we explain the syntactic approach and the issues associated with it. We argue that it has surprising behaviors and, in its current unrestricted form, it is actually too powerful for its own good. Then, we propose a restriction to make the system predicative which, by decreasing its expressiveness, actually makes it more usable. (Our actual proposal is given in 3.3). We finish by other aspect related to usability (syntax, inference).

1. What are abstract signatures ?

The art of modularity is all about controlling abstraction and interfaces. ML languages offer this control via a module system, which contains a signature language to describe interfaces. Signatures contain declarations (fields): values val x : t, types type t = int, modules module X : S, and module types module type T = S. Type and module type declarations can also be abstract type t, module type T, which serves both to hide implementation details via sealing and to have polymorphic interfaces, using functors.

Here, we focus on the construct module type T, called abstract module type or abstract signature. We start with examples adapted from this forum discussion.

1.1 Module-level sealing

Let’s consider the following scenario. Two modules providing an implementation of UDP (UDP1 and UDP2) are developed with different design trade-offs. They both implement a signature with basic send and receive operations. Then, functors are added as layers on top: taking a udp library as input, they return another udp library as an output.

Reliable adds sequence numbers to the packets and re-sends missing packets;
CongestionControl tracks the rate of missing packets to adapt the throughput to network congestion situations;
Encryption encrypts the content of all messages.

A project might need different combinations of the basic libraries and functors, while requiring that all combinations use encryption. To enforce this, the solution is to use the module-level sealing of abstract signatures. In practice, the signature of the whole library containing implementations and functors UDPLib (typically, its .mli file) is rewritten to abstract all interfaces except for the output of the Encryption functor.

module type UDPLib = sig
  module type UNSAFE

  module UDP1 : UNSAFE
  module UDP2 : UNSAFE

  module Reliable : UNSAFE -> UNSAFE
  module CongestionControl : UNSAFE -> UNSAFE

  module Encryption : UNSAFE ->
    sig val send : string -> unit (* ... *) end
end

Just as type abstraction, signature abstraction can be used to enforce certain code patterns: users of UDPLib will only be able to use the content of modules after calling the Encryption functor, and yet they have the freedom to choose between different implementations and features:

module UDPKeyHandshake = Encryption(Reliable(UDP1))
module UDPVideoStream  = Encryption(CongestionControl(UDP2))
(* etc *)

1.2 Module-level polymorphism

Another use is to introduce polymorphism at the module level. Just as polymorphic functions can be used to factor code, module-level polymorphic functors can be used to factor module expressions. If a code happens to often feature functor applications of the form Hashtbl.Make(F(X)) or Set.Make(F(X)), one can define the MakeApply functor as follows:

(* Factorizing common expressions *)
module type Type = sig module type T end
module MakeApply
  (A:Type) (X: A.T)
  (B:Type) (F: A.T -> B.T)
  (C:Type) (H: sig module Make : B.T -> C.T end) = H.Make(F(X))

Downstream the code is rewritten into MakeApply(...)(X)(...)(F)(...)(Set) or MakeApply(...)(X)(...)(F)(...)(Hashtbl) Right now, the verbosity of such example would probably be a deal-breaker. We address this aspect at the end. Ignoring the verbosity, this can be useful for maintenance: by channeling all applications through MakeApply, only one place needs to be updated if the arity or order of arguments is changed. Similarly, if several functors expect a constant argument containing – for instance – global variables, a ApplyGv functor can be defined to always provide the right second argument, which can even latter be hidden away to the user of ApplyGv:

(* Constant argument *)
module Gv : GlobalVars
module ApplyGv (Y : sig module type A module type B end)
    (F : Y.A -> GlobalVars -> Y.B)(X : Y.A) = F(X)(Gv)

Downstream, code featuring F(X)(GlobalVars) is rewritten into ApplyGv(...)(F)(X) Then, the programmer can hide the GlobalVars module while letting users use ApplyGv, ensuring that global variables are not modified in uncontrolled ways by certain part of the program.

Finally, polymorphism can also be used by a developer to prevent unwanted dependencies on implementation details. If the body of a functor uses an argument with a submodule X, but actually does not depend on the content of S, abstracting it is a “good practice”.

module F (Arg : sig ... module X : S ... end) =
  ... (* polymorphism is not enforced *)

module F' (Y: sig module type S end)
    (Arg : sig ... module X : Y.S ... end ) =
  ... (* polymorphism is enforced *)

1.3 So it’s just normal use cases of abstraction ?

Fundamentally, these example are not surprising for developers that are used to rely on abstraction to protect invariants and factor code. Their specificity lies in the fact that there are at the module level, and therefore require projects with a certain size and a strong emphasis on modularity to be justified.

2. OCaml’ abstract signatures, an incidental feature ?

The challenge for understanding (and implementing) abstract signatures lies more in the meaning of the module-level polymorphism that they offer than the module level sealing, the latter being pretty straightforward. More specifically, the crux lies in the meaning of the instantiation of an abstract signature variable A by some other signature S, that happens when a polymorphic functor is applied. OCaml follows an unrestricted syntactical approach: A can be instantiated by any (well-formed) signature S. During instantiation, all occurrences of A are just replaced by S ; finally, the resulting signature is re-interpreted—as if it were written as is by the user.

However, this syntactical rewriting interferes with the variant interpretation of signatures, which can lead to surprising behaviors. We discuss this aspect first. The unrestricted aspect leads to the (infamous) Type : Type issue which has some theoretical consequences. We finish this section by mentioning other—more technical—issues.

2.1 Syntactical rewriting

The first key issue of this approach comes from the fact that signatures in OCaml have a variant interpretation: abstract fields (1) have a different meaning (sealing or polymorphism) depending on whether they occur in positive or negative positions, and (2) abstract fields open new scopes, i.e. duplicating an abstract type field introduces two different abstract types. Overall, OCaml signatures can be thought of as having implicit quantifiers: using a signature in positive or negative position changes its implicit quantifiers (from existential to universal) while duplicating a signature duplicates the quantifiers (and therefore introduces new incompatible abstract types).

Therefore, when instantiating an abstract signature with a signature that has abstract fields, the user must be aware of this, and mentally infer the meaning of the resulting signature. To illustrate how it can be confusing, let’s revisit the first motivating example and let’s assume that the developer actually want to expose part of the interface of the raw UDP libraries. One might be tempted to instantiate UNSAFE with something along the following lines:

module type UDPLib_expose = sig
  include UDPLib with module type UNSAFE =
    sig
      module type CORE_UNSAFE
      module Unsafe : CORE_UNSAFE (* this part remains abstract *)
      module Safe : sig ... end (* this part is exposed *)
    end
end

This returns :

module type UDPLib_expose =  sig
  module type UNSAFE =
    sig
      module type CORE_UNSAFE
      module Unsafe : CORE_UNSAFE
      module Safe : sig ... end
    end
  module UDP1 : UNSAFE
  module UDP2 : UNSAFE
  module Reliable : UNSAFE -> UNSAFE
  module CongestionControl : UNSAFE -> UNSAFE
  module Encryption : UNSAFE -> sig val send : string -> unit (* ... *) end
end

However, the syntactical rewriting and reinterpretation of this signature in the negative positions produces a counter-intuitive result. For instance, if we expand the signature of the argument for the functor Reliable (for instance) we see:

module Reliable :
sig
  module type CORE_UNSAFE
  module Unsafe : CORE_UNSAFE
  module Safe : sig ... end
end -> UNSAFE

This means that the functor actually has to be polymorphic in the underlying implementation of CORE_UNSAFE, rather than using the internal details, which has the opposite meaning as before. If the user wants to hide a shared unsafe core, accessible to the functor when they were defined and then abstracted away, the following pattern may be used instead:

module type UDPLib_expose' = sig
  module type CORE_UNSAFE
  include UDPLib with module type UNSAFE = sig
    module type CORE_UNSAFE = CORE_UNSAFE
    module Unsafe : CORE_UNSAFE
    module Safe : sig ... end
  end
end

Doing so, the instantiated signature does not contain abstract fields and therefore its variant reinterpretation will not introduce unwanted polymorphism. This observation is at the core of the proposal of this post.

2.2 `Type : Type` and impredicativity

Abstract module types are impredicative: a signature containing an abstract signature can be instantiated by itself. One can trick the subtyping algorithm into an infinite loop of instantiating an abstract signature by itself, as shown by Andreas Rosseberg, adapting an example from Harper and Lillibridge (POPL ’94). This also allows type-checking of (non-terminating) programs with an absurd type, as shown by the encoding of the Girard’s paradox done by Leo White.

2.3 Other issues

The current implementation of the typechecker does not handle abstract signatures correctly in some scenarios. It’s unclear if they are just bugs or pose theoretical challenges.

2.3.1 Invalid module aliases

Inside a functor, module aliases are disallowed between the parameter and the body (for soundness reasons, due to coercive subtyping). However, this check can be bypassed by using an abstract signature that is then instantiated with an alias. If we try to use it to produce a functor that exports its argument as an alias, the typechecker crashes. This is discussed in #11441

(* crashes the typechecker in current OCaml *)
module F (Type : sig module type T end)(Y : Type.T) = Y

module Crash (Y : sig end) =
  F(struct module type T = sig module X = Y end end)

2.3.2 Loss of applicativity

The use of abstract signatures clashes with applicativity of functors, as discussed in #12204.

2.3.3 Invalid signatures and avoidance

Another known issue is that the typechecker can abstract a signature when it contains unreachable type fields (types pointing to anonymous modules). This can lead to the production of invalid signatures : signatures that are refused by the typechecker when re-entered back in.

module F (Y: sig type t end) =
  struct
    module type A = sig
      type t = Y.t (* this will force the abstraction of all of A *)
      type u
    end
    module X : A = struct type t = Y.t type u = int end
    type u = X.u
  end

module Test = F(struct type t end)

(* returns *)
module Test : sig module type A module X : A type u = X.u end

Here, the type field type u = X.u is invalid as X has an abstract signature (and therefore, no fields).

3. A solution: simple abstract signatures

In this section we explore solutions for fixing the issues of the current approach. The core criticism we make of the OCaml approach is that it is actually too expressive for its own good. Abstract signatures are impredicative: they can be instantiated by themselves. Having impredicative instantiation with variant reinterpretation is hard to track for the user and interacts in very subtle ways with other features of the module system, slowing down its development—and breaking its theoretical properties. To address this, we take the opposite stance and propose to make the system actually predicative: we restrict the set of signatures that can be used to instantiate an abstract signature. This also indirectly addresses the complexity of the variant reinterpretation.

We start with the simplest solution where instantiation of abstract signatures is restricted to signatures containing no abstract fields. Then, we propose to relax this restriction and allow for signatures that contain abstract type fields (but no abstract module types), which we call simple signatures. This will requires us to briefly discuss the need for module-level sharing.

In this section we focus on the theoretical aspects, but present them informally with examples. The practical aspects, notably syntax and inference, are discussed in the next section.

3.1 No abstraction

One might wonder why abstract types and abstract signatures syntactically resembles one another and yet, the latter is much more complex than the former. The key lies in the fact that abstract types can only be instantiated by concrete type expressions, without free variables. Informally, this:

sig
  type t
  val x : t
  val f : t -> t
end with type t = (int * 'a)

is not allowed, notably because (1) the scope of the abstract type variable 'a is unclear, (2) values of type t, like x, would be ill-typed.

Therefore, a first solution is to require abstract signatures to be instantiated only by concrete signatures, i.e. signatures with no abstract fields (neither types nor module types). This circumvents the clash between the rewriting and variant reinterpretation of abstract fields (by disallowing them).

This is simple and sound but prevents some valid uses of abstract types: in the first example, UNSAFE could not be instantiated with abstract type fields, forcing UDP1 and UDP2 to have the same type definitions.

If we want to relax the no-abstraction proposal, some abstract fields will be allowed when instantiating signatures. Then, the question of what sharing (i.e., type equalities) should be kept between different occurrences of the abstract fields arises.

In OCaml signatures, sharing between two modules is usually expressed at the core-level by rewriting the fields of the signature of the second module to refer to their counterpart in the first one. This cannot be done with abstract signatures, as they have no fields. Instead, the language needs module-level sharing, which in OCaml is very restricted. Indeed, it provides a form of module aliases (only for submodules, not at the top-level of a signature), but aliasing between a functor body and its parameter is not allowed—while it is typically the use-case for abstract signatures in polymorphic functors. Consider the following code:

(* Code *)
module F1 (Y: sig module type A module X : A end) = Y.X
module F2 (Y: sig module type A module X : A end) = (Y.X : Y.A)

Currently, the typechecker cannot distinguish between the two and returns the same signature, while we would expect the first one to keep the sharing between the parameter and the body.

(* Currently, both are given the same type: *)
module F1 (Y: sig module type A module X : A end) : A
module F2 (Y: sig module type A module X : A end) : A

As an example, we can consider the argument for the functors:

module Y = struct
  module type A = sig type t end
  module X = struct type t = int end
end

module Test1 = F1(Y)
module Test2 = F2(Y)

This returns :

module Test1 : sig type t end
module Test2 : sig type t end

While we would expect :

module Test1 : sig type t = int end
module Test2 : sig type t end

Two possible extensions would help tackle this issue.

Lazy strengthening

A recently proposed experimentation, named lazy strengthening, extends the signature language with an operator S with P, where S is a signature and P a module path. It is interpreted as S strengthened by P, i.e. S in which all abstract fields are rewritten to point to their counterpart in P. Initially considered for performance reasons, it would allow for tracking of type equalities when using abstract signatures.

(* Lazy strengthening would keep type equalities: *)
module F1 (Y: sig module type A module X : A end) = Y.A with Y.X

Transparent ascription

A more involved solution is the use of an extension of aliasing called transparent ascription, where both the alias and the signature are stored in the signature. The signature language would be extended with an operator (= P < S). The technical implications of this choice are beyond the scope of this discussion.

(* Transparent ascription would keep module equalities: *)
module F1 (Y: sig module type A module X : A end) : (= Y.X < Y.A)

3.3 Our proposal : simple abstract signatures

Maintaining a predicative approach, we propose to restrict instantiation only by simple signatures, i.e., signatures that may contain abstract type fields, but no abstract module types. This reintroduces the need to express module-level sharing and the mental gymnastic of variant re-interpretation of abstract type fields. However, it guarantees that all modules sharing the same abstract signature will also share the same structure (same fields) after instantiation, and can only differ in their type fields. We believe this makes for a good compromise.

Expressivity (and prenex-form)

One might wonder how restrictive is this proposal. Specifically, if we consider a simple polymorphic functor as:

module Apply (Y : sig module type A end) (F : Y.A -> Y.A)(X : Y.A) = F(X)

The following partial application would be rejected:

(* Rejected as A would be instantiated by `sig module type B module X : B -> B end` *)
module Apply' = Apply(struct module type A = sig module type B module X : B -> B end end)

However, this could be circumvented by eta-expanding, thus expliciting module type parameters, and instantiating only a simple signature:

(* Accepted as A is instantiated by a signature with no abstract fields *)
module Apply'' = functor (Y:sig module type B end) ->
  Apply(struct module type A = sig module type B = Y.B module X : B -> B end end)

Higher-abstraction ?

Concrete and simple signatures can be seen as the first two levels of the predicative approach for types declarations. There are no more levels for type declarations, as types cannot be partially abstract (see 3.1). Could it be useful to add even more expressivity and authorize instantiation by a signature containing again an abstract module type field (which would need to be restricted with a level system like universes)? We have found no example where this was useful. Besides, it would add a great layer of complexity.

4. Other practical aspects

Syntax

A key aspect of abstract module types that reduces their usability is the verbosity of the syntax. Rather than having to pass signature as part of a module argument to a polymorphic functor, using a separate notation for module type parameters could be more concise. In practice, abstract signature arguments could be indicated by using brackets instead of parenthesis, and interleaved with normal module arguments, as in this example:

(* At definition *)
module MakeApply
    [A] (X:A)
    [B] (F: A -> B)
    [C] (H : sig module Make : B -> C end)
  = H.Make(F(X))

module ApplyGv
    [A] [B] (F:A -> GlobalVars -> B) (X:A)
  = F(X)(Gv)

(* At the call site *)
module M1 = MakeApply
    [T] (X)
    [Hashtbl.HashedType] (F)
    [Hashtbl.S] (Hashtbl)

module M2 = ApplyGv [A] [B] (F) (X)

Technically, this is not just syntactic sugar for anonymous parameters due to the fact that OCaml relies on names for applicativity of functors.

Inference

Following up on the previous point, usability of abstract signatures could even be improved with some form of inference at call sites. Further work is needed to understand to what extend this could be done.

Conclusion

We have presented the feature of abstract signatures in OCaml. After showing use cases via examples, we explained the issues associated with the unrestricted syntactical approach. Then, we propose a new specification: simple abstract signatures. In addition to making the behavior of abstract signatures much more predictable for the user, this approach can be fully formalized by translation into Fω (extended with predicative kinds).

As stated above, our goal here was both to sum up the current state and our proposal, but also to gather feedback from users or potential users. In particular, we want to see if it can indeed cover all use cases, and if we missed other usability problems.