Gagallium : Florian’s OCaml compiler weekly, 12 June 2023

This series of blog post aims to give a short weekly glimpse into my (Florian Angeletti) daily work on the OCaml compiler. This week, the focus is on my roadmap for OCaml 5.2.0 .

My roadmap for 5.2.0

With the stabilisation of OCaml 5.1.0, I have been taking the time to write down some of my main objectives for OCaml 5.2. There are two improvements on the compiler interfaces, that I really want to see materialise in OCaml 5.2: a structured output for compiler messages and an unified short-paths implementation between Merlin and the compiler.

A structured output for compiler messages.

In order to better communicate with the various OCaml development tools, it should be possible to emit compiler messages in a structured format (JSON or SEXP). One of the Outreachy internship that I mentored implemented a first version of JSON messages few years ago in 2020.

However, one criticism of this approach was that it was not clear if the format would be useful for tools like dune. Similarly, it was not clear if this format could evolve in a backward compatible way.

After discussing the issue further with dune developers, the conclusion was that if any kind of structured output would be useful for dune, a versioned and backward compatible output would be really helpful.

This is why I am planning to go back on my structured output work in OCaml 5.2 while focusing on a versioned and structured log facility, that can be connected to various backend.

Unified short paths in compiler messages

When printing type paths in error messages, it is useful to print user-friendly type names like M.t and not whatever path the typechecker stumbled upon after various expansions like A.Very.Long(Type).Application.t.

Both the compiler type pretty-printer and Merlin have an implementation for discovering and computing canonical type paths in error messages, which is enabled by the -short-paths option.

However, the implementation of this path normalisation is completely different between Merlin and the compiler. Having two implementations is painful in term of both maintenance and evolution of this feature. Moreover, the compiler implementation was originally meant to be a temporary prototype for OCaml 4.01.0, ten years ago.

This is why I am hoping to find the time to finally upstream Merlin’s implementation of the -short-path flag.

Updating `ppxlib` after a parsetree refinement

Last week, I also spent some of my time working with the ppxlib team to iron out the last wrinkles of the second alpha for 5.1.0.

From the point of view of ppxlib, one of the interesting challenge introduced by the value binding parsetree change in 5.1.0 is that it added a new way to represent an old construct

let x : typ = expr

rather than a completely new construct.

Indeed, before OCaml 5.1, this construct was desugared to

let (x:ø. typ) = (expr:typ)

whereas in OCaml 5.1, this construct is a new distinct parsetree node.

This new parsetree node lead to some interesting questions when migrating Abstract Syntax Tree between version:

When migrating from the 5.1 to 5.0, can we reproduce the old encoding down to the location

information?

Maybe surprisingly, the answer is no. This is due to the ghost location used in the ghost constraint node: when desugaring let x: (((int))) = 0 to

let (x:ø. int) = 0

the 5.0 parser attributed to the pattern x:ø. typ the ghost location

let x : ((((int))))  =
    ^^^^^^^^^^^^^^^

But the new parsetree node only contains the location of the type int and not the location of the end of the parentheses. We are thus losing a bit of information because the former encoding was using a concrete syntax tree location that we no longer have access to. Fortunately, this only happens on a ghost location of a ghost parsetree node.

Is the migration from the 5.1 parsetree to the 5.0 parsetree always injective?

Most of the time, it is possible to map an OCaml 5.1 value binding onto an unique OCaml 5.0 encoded value binding. This works because the encoding used for value bindings in 5.0 constructs type expressions of the form ø. typ that are not allowed in OCaml. We can thus use those special type expressions to recognise desugared value bindings.

Unfortunately, this is only the case when binding variables. Indeed, as soon as the pattern in the value binding is not a variable

let (x,y) : int * int = 0, 1

the 5.0 parser desugars this value binding to

let ((x,y) : int * int) = 0, 1

without any encoding of the type constraint. This means in particular that the 5.0 parser creates the same AST node for both

let ((x,y) : int * int) = 0, 1

and

let (x,y) : int * int = 0, 1

which are two different constructs in OCaml 5.1.

Consequently, when we migrate a 5.0 parsetree of this form to the 5.1, we have to decide if we should migrate this syntactic construct to the old parsetree node or to the new node. In this case, since the new syntactic node corresponds to a “more pleasant” syntactic form, we decided to favour this form.