Florian’s compiler weekly, 13 January 2025
- January 13, 2025
This series of blog post aims to give a short weekly glimpse into my (Florian Angeletti) work on the OCaml compiler. This week subject is my personal retrospective on the release of OCaml 5.3.0.
The beginning of 2025 and the release of OCaml 5.3 feels like a good period for some introspection on my work on the compiler during this release.
Looking backk at the changelog, I have participated to more or less 50 changes in the 5.3 release. Most of those changes (~40) can be classified into five major themes:
- Error messages
- Compiler display infrastructure
- Tooling integration
- Manual and documentation
- Type system bug fixes
Retrospecting, I have been quite busy this release with background work, and I still have more background work planned for OCaml 5.4. Hopefully, this background work will bear more visible fruits in the next releases.
Error messages
The first theme for this release is recurrent for me, and I hope that I would be able to spend even more time on this subject soon. Indeed for this release, most of the error message improvements were relatively quick improvements on various topics (first class modules, function labelled arguments, functors and type clashes). Nevertheless, I still have many larger projects planned for improving error messages in the longer terms, in particular:
- Efficient diffing for module level error messages: there is prototype implementation written by Malo Monin on his summer internship last summer which needs more polish before being integrated.
- Semantic diffing on type expressions: using the more reliable error trace, I hope to come back to my previous on syntaxic difference highlighting in type expressions and implement a fully semantic version.
For more details, here are the corresponding changelog entries extracted from the 5.3 changelog:
#12980: Explain type mismatch involving first-class modules by including the module level error message (Florian Angeletti, review by Vincent Laviron)
#12985, #12988: Better error messages for partially applied functors. (Florian Angeletti, report by Arthur Wendling, review by Gabriel Scherer)
#13034, #13260: Better error messages for mismatched function labels (Florian Angeletti, report by Daniel Bünzli, review by Gabriel Scherer and Samuel Vivien)
#13341: a warning when the pattern-matching compiler pessimizes code because side-effects may mutate the scrutinee during matching. (This warning is disabled by default, as this rarely happens and its performance impact is typically not noticeable.) (Gabriel Scherer, review by Nick Roberts, Florian Angeletti and David Allsopp)
#13255: Re-enable warning 34 for unused locally abstract types (Nick Roberts, review by Chris Casinghino and Florian Angeletti)
#12182: Improve the type clash error message. For example, this message: This expression has type … is changed into: The constant “42” has type … (Jules Aguillon, review by Gabriel Scherer and Florian Angeletti)
#13170: Fix a bug that would result in some floating alerts
[@@@alert ...]
incorrectly triggering Warning 53. (Nicolás Ojeda Bär, review by Chris Casinghino and Florian Angeletti)#13203: Do not issue warning 53 if the compiler is stopping before attributes have been accurately marked. (Chris Casinghino, review by Florian Angeletti)
Compiler display infrastructure
This release I ended up spending a sizeable amount of time
refactoring or improving the various display mechanism in the compiler.
In particular, OCaml 5.3 comes with a new internal format for error
messages, a new graphical debugger printer for type expressions, and the
correction on many smaller printing bugs for booleans and the
mod
operator.
This trend will continue in OCaml 5.4 since I have already launched a project on updating the formatting of error messages and warnings, and I am hoping to finally integrate my work on structured diagnostics in this version of OCaml.
However, beyond this (significant) piece of work, don’t really have longer plans on this subject.
The related changelog entries for OCaml 5.3 are:
Highlights
#13049: graphical debugging printer for types (Florian Angeletti, review by Gabriel Scherer)
#13169, #13311: Introduce a document data type for compiler messages rather than relying on
Format.formatter -> unit
closures. (Florian Angeletti, review by Gabriel Scherer)
Error messages styling
#12891: Improved styling for initial prompt (Florian Angeletti, review by Gabriel Scherer)
#13263, #13560: fix printing true and false in toplevel and error messages (no more unexpected #true) (Florian Angeletti, report by Samuel Vivien, review by Gabriel Scherer)
#13151, name conflicts explanation as a footnote (Florian Angeletti, review by Gabriel Scherer)
#13053: Improved display of builtin types such as
_ list
when aliased. (Samuel Vivien, review by Florian Angeletti)
Internal refactoring and bug fixes
#13336: compiler-libs, split the
Printtyp
in three to only keep “user-friendly” functions in thePrinttyp
module. (Florian Angeletti, review by Gabriel Scherer)#12888: fix printing of uncaught exceptions in
.cmo
files passed on the command-line of the toplevel. (Nicolás Ojeda Bär, review by Florian Angeletti, report by Daniel Bünzli)#13099: Fix erroneous loading of cmis for some module type errors. (Nick Roberts, review by Florian Angeletti)
#13251: Register printer for errors in Emitaux (Vincent Laviron, review by Miod Vallat and Florian Angeletti)
Source printer
#13391, #13551: fix a printing bug with
-dsource
when using raw literal inside a locally abstract type constraint (i.e.let f: type \#for. ...
) (Florian Angeletti, report by Nick Roberts, review by Richard Eisenberg)#13603, #13604: fix source printing in the presence of the escaped raw identifier
\#mod
. (Florian Angeletti, report by Chris Casinghino, review by Gabriel Scherer)
Tooling integration
Another important subject during this release was the improvement of the metadata generated for Merlin. OCaml 5.3.0 metadata now track more accurately identifiers across implementation and interfaces, and record precisely how implementation identifiers were matched to interface identifiers in a module. For the next release, I am planning on improving tooling integration with merlin by reducing the difference between merlin typechecker and the compiler typechecker. The exact specification can be found following the link in the corresponding changelog entry:
#13308: keep track of relations between declaration in the cmt files. This is useful information for external tools for navigation and analysis purposis. (Ulysse Gérard, Florian Angeletti, review by Florian Angeletti and Gabriel Scherer)
#13286: Distinguish unique identifiers
Shape.Uid.t
according to their provenance: either an implementation or an interface. (Ulysse Gérard, review by Florian Angeletti and Leo White)
In a similar way, I have also worked on improving the compatibility of the internal compiler library with ppxlib and MetaOCaml, and implemented a new command line flag to improve the backward compatibility of the lexer:
#11129, #11148: enforce that ppxs do not produce
parsetree
s with an empty list of universally quantified type variables (. int -> int
instead of'a . int -> int'
) (Florian Angeletti, report by Simmo Saan, review by Gabriel Scherer)#13471: add
-keywords <version?+list>
flag to define the list of keywords recognized by the lexer, for instance-keywords 5.2
disable theeffect
keyword. (Florian Angeletti, review by Gabriel Scherer)#13257: integrate MetaOCaml in the Menhir grammar to ease MetaOCaml maintenance. This is a purely internal change: there is no support in the lexer, so no change to the surface OCaml grammar. (Oleg Kiselyov, Gabriel Scherer and Florian Angeletti, review by Jeremy Yallop)
Manual and documentation:
As an author, I have documented the modest Unicode support introduced in 5.3 and found the time to write down the compiler release cycles. However, most on my time working on the documentation has been focused on reviewing PRs, ranging from a small change to the manual css to an update to the manual section on polymorphic recursion proposed by a new contributor.
#13668: Document the basic support for unicode identifiers and the switch to UTF-8 encoded Unicode text for OCaml source file (Florian Angeletti, review by Nicolás Ojeda Bär and Daniel Bünzli)
#12949: document OCaml release cycles and version strings in
release-info/introduction.md
. (Florian Angeletti, review by Fabrice Buoro, Kate Deplaix, Damien Doligez, and Gabriel Scherer)#12298: Manual: emphasize that Bigarray.int refers to an OCaml integer, which does not match the C int type. (Edwin Török, review by Florian Angeletti)
#12868: Manual: simplify style colours of the post-processed manual and API HTML pages, and fix the search button icon (Yawar Amin, review by Simon Grondin, Gabriel Scherer, and Florian Angeletti)
#12976: Manual: use webman/version/ * .htmlandwebman/version/api/ for OCaml.org HTML manual generation (Shakthi Kannan, review by Hannes Mehnert, and Florian Angeletti)
#13295: Use syntax for deep effect handlers in the effect handlers manual page. (KC Sivaramakrishnan, review by Anil Madhavapeddy, Florian Angeletti and Miod Vallat)
#13469, #13474, #13535: Document that [Hashtbl.create n] creates a hash table with a default minimal size, even if [n] is very small or negative. (Antonin Décimo, Nick Bares, report by Nikolaus Huber and Jan Midtgaard, review by Florian Angeletti, Anil Madhavapeddy, Gabriel Scherer, and Miod Vallat)
#13666: Rewrite parts of the example code around nested lists in Chapter 6 (Polymorphism and its limitations -> Polymorphic recursion) giving the “depth” function [in the non-polymorphically-recursive part of the example] a much more sensible behavior; also fix a typo and some formatting. (Frank Steffahn, review by Florian Angeletti)
Type system bug fixes
At last, but not least, I spent many hours this release fixing internal errors due to inconsistent type constraints in the module systems. I have also reviewed many bug fixes and in particular a serie of issues in the typechecker related to the handling of non-injective type parameters.
#12959, #13055: Avoid an internal error on recursive module type inconsistency (Florian Angeletti, review by Jacques Garrigue and Gabriel Scherer)
#13388, #13540: raises an error message (and not an internal compiler error) when two local substitutions are incompatible (for instance
module type S:=sig end type t:=(module S)
) (Florian Angeletti, report by Nailen Matschke, review by Gabriel Scherer, and Leo White)#13185, #13192: Reject type-level module aliases on functor parameter inside signatures. (Jacques Garrigue, report by Richard Eisenberg, review by Florian Angeletti)
#13306: An algorithm in the type-checker that checks two types for equality could sometimes, in theory, return the wrong answer. This patch fixes the oversight. No known program triggers the bug. (Richard Eisenberg, review by Florian Angeletti)
#13495, #13514: Fix typechecker crash while typing objects (Jacques Garrigue, report by Nicolás Ojeda Bär, review by Nicolas Ojeda Bär, Gabriel Scherer, Stephen Dolan, Florian Angeletti)
#13579, #13583: Unsoundness involving non-injective types + gadts (Jacques Garrigue, report by @v-gb, review by Richard Eisenberg and Florian Angeletti)
#13598: Falsely triggered warning 56 [unreachable-case] This was caused by unproper protection of the retyping function. (Jacques Garrigue, report by Tõivo Leedjärv, review by Florian Angeletti)
Miscellaneous
Beyond those major themes, I also had my hands or eyes at few of the changes in the language, the standard library and the compiler build system.
Language features:
In term of language features, I partipated to the review for the newly introduced syntax for effect handler (#12309, #13158) on the type system and documentation (#13295) sides.
let with_gen f = match f () with
Random.float 1.)
| effect Random_float, k -> Effect.Deep.continue k ( | x -> x
I also finalized the support the new modest support of utf-8 encoded Unicode source files in OCaml 5.3 (#11736, #12664, #13628)
type saison = Printemps | Été | Automne | Hiver
and documented it (#13668).
Type system
On the pure type system side, I have participated to the review on the extended support for annotating types in GADT pattern. (#11891, #12507). It is now possible to give a name to all type variables introduced by pattern matching on a GADT constructor
type _ t =
string -> char array t
| S: array -> 'a array t
| A: 'a
let len (type a) (x:a t) = match x with
type a) (x:a array) -> Array.length x
| A (String.length x | S x ->
whereas it was only possible to name existentially quantified type variables before.
Standard library:
I have reviewed two standard library Pull Requests (PR)
#13168: In Array.shuffle, clarify the code that validates the result of the user-supplied function
rand
, and improve the error message that is produced when this result is invalid. (François Pottier, review by Florian Angeletti, Daniel Bünzli and Gabriel Scherer)#13296: Add mem, memq, find_opt, find_index, find_map and find_mapi to Dynarray. (Jake H, review by Gabriel Scherer and Florian Angeletti)
and authored one PR exposing support for a hidden feature of the Format module:
- #12133: Expose support for printing substrings in Format (Florian Angeletti, review by Daniel Bünzli, Gabriel Scherer and Nicolás Ojeda Bär)
Build system:
During the release, I also happened to review some build system changes:
#13285: continue the merge of the sub-makefiles into the root Makefile started with #11243, #11248, #11268, #11420, #11675, #12198, #12321, #12586, #12616, #12706 and #13048. (Sébastien Hinderer, review by David Allsopp and Florian Angeletti)
(breaking change) #13070: On Windows, when configured with bootstrapped flexdll, don’t add +flexdll to the search path when -nostdlib is specified (which then means
-L path-to-flexdll
no longer gets passed to the system linker). (David Allsopp, review by Florian Angeletti)