Gagallium : Florian's OCaml compiler weekly, 17 April 2023

This series of blog post aims to give a short weekly glimpse into my (Florian Angeletti) daily work on the OCaml compiler. This week, the focus is on the first alpha release of OCaml 5.1.0 and some discussion with the ocamlformat team.

First alpha release for OCaml 5.1.0

Between Friday and Saturday, I have published the first alpha for OCaml 5.1.0. As the first version of OCaml 5 published after the feature freeze for OCaml 5, this version feels like a midpoint between the usual release process for OCaml 4 and the experimental release of OCaml 5.0.0 .

In particular, this release will integrate many features that were either frozen during the development of OCaml 5 or merged in the development version after the branch for OCaml 5.0 was cut. For instance, the support for Risc-V was merged in July last year, but it will only be available with OCaml 5.1 around next July.

Contrarily, the development windows for contributors that were busy with OCaml 5.0.0 bug fixing was especially short since there was only four months between the OCaml 5.0.0 release and the feature freeze for OCaml 5.1 .

It is a bit too soon right now to try to summarize the new features in OCaml 5.1, since unexpected problems might still require to remove some of the new features (even if that happens rarely in practice).

However, I have a quite interesting early example of unexpected incompatibility due to a refactoring: the more precise support for generative functors break the menhir parser generator.

An example on unintended breakage for generative functor

What are generative functors?

In brief, generative functors are a way to express the fact that evaluating a functor create side-effect that meaningfully impact the types that the functor creates and thus two successive applications of the functor should away yield different types.

This would be hopefully clearer with the following example, consider the functor:

let meta = ref 1
module Make_counter(X: sig end): sig
  type t
  val create: unit -> t
  val incr: t -> unit
  val print: t -> int
end
= struct
  let stride = incr meta; !meta
  type t = int ref
  let create () = ref 0
  let incr t = t := !t + stride
  let print x = assert (!x mod stride = 0); !x
end

Here, the functor is applicative, and unsafe! We can break the internal assertion that we only add stride to our counters by using the fact that the two modules Counter_1 and Counter_2 share the same types t in

module A = struct end
module Counter_1 = Make_counter(A)
module Counter_2 = Make_counter(A)

Thus, we can mix calls to functions of the two modules to break one of the internal invariants:

let assert_failure =
  let c = Counter_1.create () in
  Counter_2.incr c;
  Counter_1.print c

Of course, here the issue is that the functor Counter was intended to be used only with anonymous structure as an argument

module Counter = Make_counter(struct end)

Here, since we have lost the identity of the anonymous module after the application, we are guaranteed that the type Counter.t is fresh.

Generative functors (available since OCaml 4.02) makes it possible to express this intent in the module type system. By defining the functor Make_counter as generative with

module Generative_make_counter(): sig
  type t
  val create: unit -> t
  val incr: t -> unit
  val print: t -> int
end
= struct
  let stride = incr meta; !meta
  type t = int ref
  let create () = ref 0
  let incr t = t := !t + stride
  let print x = assert (!x mod stride = 0); !x
end
module Counter = Generative_make_counter()

we inform the module system that

module A = struct end
module Counter_1 = Generative_make_counter(A)

is an error which is rejected with

Error: This is a generative functor. It can only be applied to ()

Consequently, we are guaranteed that each call to Make_counter creates a fresh type t.

However, back in 4.02 and 2014, it was decided to represent the generative application as an application to a syntactic empty structure. In other words,

module Counter_1 = Make_counter()

was represented as

module Counter_1 = Make_counter(struct end)

This choice of the representation was simpler but it has the disadvantage of allowing some confusing code:

First, applicative functors could applied to the unit argument:

module W = Make_counter()

Second, generative functors could be applied to a syntactically empty structure:

module E = Generative_make_counter(struct end)

At least, both options make it clear that the types of the generated modules would be fresh.

Nevertheless, with more hindsight, it seems better to make the distinction between the two cases clearer. Thus starting with OCaml 5.1, the parser and the typechecker distinguishes between F() and F(struct end).

In OCaml 5.1, applying a functor to a syntactically empty structure

module Warning = Generative_make_counter(struct end)

generates a warning

Warning 73 [generative-application-expects-unit]: A generative functor
should be applied to '()'; using '(struct end)' is deprecated.

This warning is here to let some breathing room for ppxs that had to use this syntax before OCaml 5.1 .

Contrarily, applying an applicative functor to the empty argument generates an error

module Error = Make_counter()

Error: The functor was expected to be applicative at this position

During the review of this change, I didn’t think about the possibility that some OCaml programs would have switch to generative syntax for application without making the change to the type of the functor itself.

But this was too optimistic for at least one opam package. This package is now fixed, but it remains to be seen if this was an unfortunate and rare accident. If this is not the case, we will need to add a deprecation warning on this side too.

OCaml Parser and ocamlformat

This week, I also had an interesting discussions with members of the ocamlformat team concerning upstreaming some of the ocamlformat patches to the compiler.

As a code formatter, ocamlformat needs to maintain a more precise mapping between its syntax tree and the code source that the main OCaml parser. Indeed, ocamlformat cannot afford to discard meaningful distinction in the code source due to some synctactic sugar. Contrarily, the main compiler only need to keep enough information about the code source to be able to report errors, and prints the parsed abstract syntax tree in a good-enough format.

The objectives of the two parsers are thus not completely aligned. However, comparing notes from time to time is a good way to catch potential issues.

Is the compiler loosing important location information?
Is the compiler mixing different concern in the parsing of the code source?
Is the compiler making ppxs transformation harder to express because the AST veer too far from the surface language?

A good example of the last two categories was my change for type constraints on value binding. Indeed, before this change the OCaml parser read

let f: type a b. a -> b -> a = fun x _ -> x

as if the programmer had written:

let f: 'a 'b. 'a -> 'b -> 'a = fun (type a) (type b) -> (fun x _ -> x : a -> b -> a)

Of course, the two construct are defined to be equivalent at the level of the typechecker. It is however pretty clear that the distinction between the two is very meaningful for the programmer. Moreover, the transformation is complex enough that ppx authors would probably rather not try to undo the transformation.

Moving the transformation from the parser to the typechecker was thus deemed a good move.

For OCaml 5.2, we will try to seek other refactoring to the parser that would make sense in the main parser while reducing ocamlformat maintenance burden.