Florian's OCaml compiler weekly, 12 June 2023
- June 12, 2023
This series of blog post aims to give a short weekly glimpse into my (Florian Angeletti) daily work on the OCaml compiler. This week, the focus is on my roadmap for OCaml 5.2.0 .
My roadmap for 5.2.0
With the stabilisation of OCaml 5.1.0, I have been taking the time to
write down some of my main objectives for OCaml 5.2. There are two
improvements on the compiler interfaces, that I really want to see
materialise in OCaml 5.2: a structured output for compiler messages and
an unified short-paths
implementation between Merlin and
the compiler.
A structured output for compiler messages.
In order to better communicate with the various OCaml development tools, it should be possible to emit compiler messages in a structured format (JSON or SEXP). One of the Outreachy internship that I mentored implemented a first version of JSON messages few years ago in 2020.
However, one criticism of this approach was that it was not clear if the format would be useful for tools like dune. Similarly, it was not clear if this format could evolve in a backward compatible way.
After discussing the issue further with dune developers, the conclusion was that if any kind of structured output would be useful for dune, a versioned and backward compatible output would be really helpful.
This is why I am planning to go back on my structured output work in OCaml 5.2 while focusing on a versioned and structured log facility, that can be connected to various backend.
Unified short paths in compiler messages
When printing type paths in error messages, it is useful to print
user-friendly type names like M.t
and not whatever path the
typechecker stumbled upon after various expansions like
A.Very.Long(Type).Application.t
.
Both the compiler type pretty-printer and Merlin have an
implementation for discovering and computing canonical type paths in
error messages, which is enabled by the -short-paths
option.
However, the implementation of this path normalisation is completely different between Merlin and the compiler. Having two implementations is painful in term of both maintenance and evolution of this feature. Moreover, the compiler implementation was originally meant to be a temporary prototype for OCaml 4.01.0, ten years ago.
This is why I am hoping to find the time to finally upstream Merlin’s
implementation of the -short-path
flag.
Updating
ppxlib
after a parsetree refinement
Last week, I also spent some of my time working with the ppxlib team to iron out the last wrinkles of the second alpha for 5.1.0.
From the point of view of ppxlib, one of the interesting challenge
introduced by the value binding
parsetree change in 5.1.0
is that it added a new way to represent an old construct
let x : typ = expr
rather than a completely new construct.
Indeed, before OCaml 5.1, this construct was desugared to
let (x:ø. typ) = (expr:typ)
whereas in OCaml 5.1, this construct is a new distinct parsetree node.
This new parsetree node lead to some interesting questions when migrating Abstract Syntax Tree between version:
When migrating from the 5.1 to 5.0, can we reproduce the old encoding down to the location
information?
Maybe surprisingly, the answer is no. This is due to the
ghost location used in the ghost constraint node: when desugaring
let x: (((int))) = 0
to
let (x:ø. int) = 0
the 5.0 parser attributed to the pattern x:ø. typ
the
ghost location
let x : ((((int)))) =
^^^^^^^^^^^^^^^
But the new parsetree node only contains the location of the type
int
and not the location of the end of the parentheses. We
are thus losing a bit of information because the former encoding was
using a concrete syntax tree location that we no longer have access to.
Fortunately, this only happens on a ghost location of a ghost parsetree
node.
Is the migration from the 5.1 parsetree to the 5.0 parsetree always injective?
Most of the time, it is possible to map an OCaml 5.1 value binding
onto an unique OCaml 5.0 encoded value binding. This works because the
encoding used for value bindings in 5.0 constructs type expressions of
the form ø. typ
that are not allowed in OCaml. We can thus
use those special type expressions to recognise desugared value
bindings.
Unfortunately, this is only the case when binding variables. Indeed, as soon as the pattern in the value binding is not a variable
let (x,y) : int * int = 0, 1
the 5.0 parser desugars this value binding to
let ((x,y) : int * int) = 0, 1
without any encoding of the type constraint. This means in particular that the 5.0 parser creates the same AST node for both
let ((x,y) : int * int) = 0, 1
and
let (x,y) : int * int = 0, 1
which are two different constructs in OCaml 5.1.
Consequently, when we migrate a 5.0 parsetree of this form to the 5.1, we have to decide if we should migrate this syntactic construct to the old parsetree node or to the new node. In this case, since the new syntactic node corresponds to a “more pleasant” syntactic form, we decided to favour this form.