Gagallium : Monomophic let in OCaml?

Since version 7, GHC has decided to treat local let-definitions as monomorphic. We check how such a change would fare with OCaml.

In their 2010 paper Let should not be generalised, Dimitrios Vytiniotis, Simon Peyton Jones and Tom Schrijvers argue that this allows for simpler type inference, with a very limited impact on the existing code base. Independently of the potential benefit of such a choice, it is interesting to see how much polymorphism is really needed in OCaml.

To check the impact of a monomorphic let, I did a small experiment, modifying the OCaml compiler to only generalize let-definitions when there is an explicit polymorphic-type annotation. (Strictly speaking, this is not full let-monomorphism, as we still generalize implicit parts inside the annotation, but this should not make any difference for our experiment.)

The patch can be found here.

Then I tried to compile various software with this modified compiler. Namely, the OCaml distribution itself, LablGTK, and Core/Async with their dependencies. At the beginning I was fixing missing polymorphism by adding annotations by hand, but soon I came to just recompile the file with the standard compiler when the modified compiler failed.

The results are summarized in this table:

Package	Number of files	Failed files	Ratio	Errors/Line
OCaml	488	20	4%	1/7000
LablGTK	101	7	7%	1/3000
Oasis setup.ml	1	1	100%	1/6000
Sexplib	12	1	8%	1/3500
Variantslib	2	1	50%	1/500
Core_kernel	170	16	9%	1/1800
Core	178	4	2%	1/4000
Async_core	36	4	11%	1/1600
Async_unix	31	7	23%	1/1000
Async_extra	22	2	9%	1/3000

Total	1041	63	6%	1/1700

Since we did not count the number of polymorphic let-definitions, but just the number of failed files, the results are a bit rough. Still, they tell us that about 6% of files did not compile, and that more than 1 out of 1700 lines must be modified.

A comparison with the GHC distribution may be interesting: according to the OutsideIn paper 20 modules out of 533 had to be modified, which is about 4%. This is a bit less than the results for OCaml, but GHC files tend to be shorter. They also found that 12% of packages on Hackage needed modification, which is not very informative, but seems to be much lower than our results for OCaml (a package is supposed to contain several modules).

More important is how easy it is to fix the errors. Namely, most of polymorphic let-definitions fall in one of three cases:

function specialized to some local parameter, but still polymorphic
uses of Obj.magic
polymorphic variants

The second case is a bit special, and one could actually argue that it is better to annotate all uses of Obj.magic. Still, these were probably cases where there was a better place for annotation.

Most cases actually pertain to the first category. Fixing those is not as easy as one could think. The types involved tend to be large, and sometimes difficult to reconstruct by hand. A solution is to move the function to the toplevel, as it will then be automatically generalized, but this has two drawbacks: first, the enclosing function may be large so that the local definition is moved far away from its use point; second, we have to perform lambda-lifting manually, abtracting over local parameters and pass them explicitly at each use point.

Finally, polymorphic variants are particularly troublesome, because the inferred type can be huge, and we are in a situation where we need it to be polymorphic, so that using type abbreviations may not be so easy.

What conclusions can we draw?

Notice that these figures are comparable to experiments conducted when the value restriction had been introduced. In that case, it was argued that eta-expansion solved most situations at a very low syntactic (and computational) cost. For monomorphic let, it suffices to provide an annotation to recover polymorphism, thus the solution is universal, but its cost for the programmer is higher.

Actually, this discussion is mostly academic, because in OCaml we have no specific reason to make local let-definitions monomorphic. By contrast with Haskell which propagates type information through constraint solving, we instead only perform forward propagation, and actually rely on polymorphism to decide when some type information does not depend on the order of type inference and is safe to use. GADTs create problems of their own, but we handle these through ambivalent types, a mechanism similar in spirit to the tracking of sharing used for type information, which again relies on local polymorphism rather than forgoes it.