This series of blog post aims to give a short weekly glimpse into my (Florian Angeletti) work on the OCaml compiler: this week, the focus is on format string and how to serialize partial error messages while still using the formatting engine from the Format module.

Last week, beyond some ongoing discussion on the refactorisation of Dynlink, and a draft for some future tutorial on GADTs, I spent some time refactoring and cleaning up my work on an alternative interpreter for OCaml format strings.

A serializable data type for Format messages

A medium term objective for me this year is to make it possible for the compiler to emit machine-readable messages at the destination of the various development tools for OCaml.

This would avoid the need for those tools to parse the compiler error messages or warnings, and make it simpler for exterior contributors to experiment with new error formats.

One of the obstacle towards this objective comes from the difficulty to have partial messages when using Format as a formatting engine. As an example, imagine that I want to print an error message with a prefix

@[Error:

and a main body

This expression has type int@ which is not a record type

if I want to preserve the newline hint in the main body while making the body message starts just after the prefix, I need to print the two parts of the error message at the same time, with for instance:

Format.fprintf ppf "%t%t" prefix main_body

If I rendered ever part of the messages to string before printing I would lose the context that the Format module is using for indentation and line breaks. Similarly, I cannot start rendering the second message before the first. This means that the Format requires us to always print messages in order.

This complexity is reflected in the type of the compiler error report where partial messages are represented as suspended closure:

type msg = (Format.formatter -> unit) loc
type report = {
  kind : report_kind;
  main : msg;
  sub : msg list;
}

This representation creates three issues:

  • First, one must be very careful that the delayed closure does not capture the wrong global state.
  • Second, it is not serializable.
  • Third, it is cumbersome and for instance warning messages where never converted to this format.

As surprising as it may sound, the first grievance rears its head not that infrequently in the compiler code base because the pretty-printer for types is full of global states (there are some global state to track loop, some other state to track naming decision, yet another global state to track shortest path name).

As a way to circumvent this issue, I have been working on immutable interpreter for format strings which translates format strings as a sequence of formatting instruction that might be interpreted later by a formatting engine.

For instance with this interpreter, the format string

Format_doc.Immutable.printf
  "@[This is a text with %s,@ breaks and @[%d box@].@]"
   "one hole" 2
   Format_doc.Doc.empty

is rendered to the following sequence of instructions for Format:

[
 Open_box {kind = B; indent = 0};
 Data "This is a text with ";
 Data "one hole";
 Data ",";
 Simple_break {spaces = 1; indent = 0};
 Data "breaks and ";
 Open_box {kind = B; indent = 0};
 Data "2";
 Data " box";
 Close_box;
 Data ".";
 Close_box
]

One advantages of this type is that we have transformed the format string into data, with no closures in sight. The format is thus inherently serializable and does not rely on any captured state.

Moreover, with a bit of GADTs, we can create a compatibility layer between the classical Format interpreter and the new immutable interpreter.

First, we define compatibility formatters as

type rdoc = Doc.t ref
type _ formatter =
  | Format: Format.formatter -> Format.formatter formatter
  | Doc: rdoc -> rdoc formatter

Then the actual printing functions can choose which underlying function to call in function of the formatter:

let pp_print_string (type i) (ppf: i formatter) s = match ppf with
  | Format ppf -> Format.pp_print_string ppf s
  | Doc rdoc -> rdoc := Immutable.string s !rdoc

Splitting all primitive functions of the Format module gives us a new fprintf function with type:

val fprintf : 'impl formatter -> ('a,'impl formatter,unit) format -> 'a

With this compatibility layer in place, converting a Format printers is a matter of adding a single open Format_doc.Compat.

I am still pondering on the implementation and design of this alternative printing module. It is thus probable that I will end up tying the final PR on this feature but I have made the implementation available as a small format-doc library.