Florian's OCaml compiler weekly, 17 April 2023
- April 17, 2023
This series of blog post aims to give a short weekly glimpse into my (Florian Angeletti) daily work on the OCaml compiler. This week, the focus is on the first alpha release of OCaml 5.1.0 and some discussion with the ocamlformat team.
First alpha release for OCaml 5.1.0
Between Friday and Saturday, I have published the first alpha for OCaml 5.1.0. As the first version of OCaml 5 published after the feature freeze for OCaml 5, this version feels like a midpoint between the usual release process for OCaml 4 and the experimental release of OCaml 5.0.0 .
In particular, this release will integrate many features that were either frozen during the development of OCaml 5 or merged in the development version after the branch for OCaml 5.0 was cut. For instance, the support for Risc-V was merged in July last year, but it will only be available with OCaml 5.1 around next July.
Contrarily, the development windows for contributors that were busy with OCaml 5.0.0 bug fixing was especially short since there was only four months between the OCaml 5.0.0 release and the feature freeze for OCaml 5.1 .
It is a bit too soon right now to try to summarize the new features in OCaml 5.1, since unexpected problems might still require to remove some of the new features (even if that happens rarely in practice).
However, I have a quite interesting early example of unexpected incompatibility due to a refactoring: the more precise support for generative functors break the menhir parser generator.
An example on unintended breakage for generative functor
What are generative functors?
In brief, generative functors are a way to express the fact that evaluating a functor create side-effect that meaningfully impact the types that the functor creates and thus two successive applications of the functor should away yield different types.
This would be hopefully clearer with the following example, consider the functor:
let meta = ref 1
module Make_counter(X: sig end): sig
type t
val create: unit -> t
val incr: t -> unit
val print: t -> int
end
= struct
let stride = incr meta; !meta
type t = int ref
let create () = ref 0
let incr t = t := !t + stride
let print x = assert (!x mod stride = 0); !x
end
Here, the functor is applicative, and unsafe! We can break the
internal assertion that we only add stride
to our counters
by using the fact that the two modules Counter_1
and
Counter_2
share the same types t
in
module A = struct end
module Counter_1 = Make_counter(A)
module Counter_2 = Make_counter(A)
Thus, we can mix calls to functions of the two modules to break one of the internal invariants:
let assert_failure =
let c = Counter_1.create () in
Counter_2.incr c;
Counter_1.print c
Of course, here the issue is that the functor Counter
was intended to be used only with anonymous structure as an argument
module Counter = Make_counter(struct end)
Here, since we have lost the identity of the anonymous module after
the application, we are guaranteed that the type Counter.t
is fresh.
Generative functors (available since OCaml 4.02) makes it possible to
express this intent in the module type system. By defining the functor
Make_counter
as generative with
module Generative_make_counter(): sig
type t
val create: unit -> t
val incr: t -> unit
val print: t -> int
end
= struct
let stride = incr meta; !meta
type t = int ref
let create () = ref 0
let incr t = t := !t + stride
let print x = assert (!x mod stride = 0); !x
end
module Counter = Generative_make_counter()
we inform the module system that
module A = struct end
module Counter_1 = Generative_make_counter(A)
is an error which is rejected with
Error: This is a generative functor. It can only be applied to ()
Consequently, we are guaranteed that each call to
Make_counter
creates a fresh type t
.
However, back in 4.02 and 2014, it was decided to represent the generative application as an application to a syntactic empty structure. In other words,
module Counter_1 = Make_counter()
was represented as
module Counter_1 = Make_counter(struct end)
This choice of the representation was simpler but it has the disadvantage of allowing some confusing code:
- First, applicative functors could applied to the unit argument:
module W = Make_counter()
- Second, generative functors could be applied to a syntactically empty structure:
module E = Generative_make_counter(struct end)
At least, both options make it clear that the types of the generated modules would be fresh.
Nevertheless, with more hindsight, it seems better to make the
distinction between the two cases clearer. Thus starting with OCaml 5.1,
the parser and the typechecker distinguishes between F()
and F(struct end)
.
In OCaml 5.1, applying a functor to a syntactically empty structure
module Warning = Generative_make_counter(struct end)
generates a warning
Warning 73 [generative-application-expects-unit]: A generative functor
should be applied to '()'; using '(struct end)' is deprecated.
This warning is here to let some breathing room for ppxs that had to use this syntax before OCaml 5.1 .
Contrarily, applying an applicative functor to the empty argument generates an error
module Error = Make_counter()
Error: The functor was expected to be applicative at this position
During the review of this change, I didn’t think about the possibility that some OCaml programs would have switch to generative syntax for application without making the change to the type of the functor itself.
But this was too optimistic for at least one opam package. This package is now fixed, but it remains to be seen if this was an unfortunate and rare accident. If this is not the case, we will need to add a deprecation warning on this side too.
OCaml Parser and ocamlformat
This week, I also had an interesting discussions with members of the ocamlformat team concerning upstreaming some of the ocamlformat patches to the compiler.
As a code formatter, ocamlformat needs to maintain a more precise mapping between its syntax tree and the code source that the main OCaml parser. Indeed, ocamlformat cannot afford to discard meaningful distinction in the code source due to some synctactic sugar. Contrarily, the main compiler only need to keep enough information about the code source to be able to report errors, and prints the parsed abstract syntax tree in a good-enough format.
The objectives of the two parsers are thus not completely aligned. However, comparing notes from time to time is a good way to catch potential issues.
- Is the compiler loosing important location information?
- Is the compiler mixing different concern in the parsing of the code source?
- Is the compiler making ppxs transformation harder to express because the AST veer too far from the surface language?
A good example of the last two categories was my change for type constraints on value binding. Indeed, before this change the OCaml parser read
let f: type a b. a -> b -> a = fun x _ -> x
as if the programmer had written:
let f: 'a 'b. 'a -> 'b -> 'a = fun (type a) (type b) -> (fun x _ -> x : a -> b -> a)
Of course, the two construct are defined to be equivalent at the level of the typechecker. It is however pretty clear that the distinction between the two is very meaningful for the programmer. Moreover, the transformation is complex enough that ppx authors would probably rather not try to undo the transformation.
Moving the transformation from the parser to the typechecker was thus deemed a good move.
For OCaml 5.2, we will try to seek other refactoring to the parser that would make sense in the main parser while reducing ocamlformat maintenance burden.