One of my parsing-related Coq files takes more than two hours to compile, which is pretty annoying when the file is less than one thousand lines long and does no complex computations. After some profiling, it appeared that Coq was spending most of its time in the Univ module. This module is part of the Coq kernel, its role is to check that the user is using universes in a consistent way.

You can get an introduction of what universes are in this book chapter. In Coq, universes are implicit: it is the role of the Coq kernel to determine whether it is possible to give a number to each occurrence of Type, in a manner that is proved not to introduce inconsistencies.

This computation is done in the Univ module: the rest of the Coq kernel generates some constraints, and this module is supposed to warn if they are not satisfiable. Roughly, theses constraints are of the form: "universe A has an index strictly smaller than universe B", or "universe A has an index smaller or equal to universe B". In order to check that they are satisfiable, it is enough to see all these constraints as a graph where edges are constraints and vertexes are universes, and check that there is no cycle containing a strict constraint.

This graph can be simplified: if there is a cycle of large constraints, that does mean that all universes in this cycle are in fact necessarily equivalent, and we can replace this cycle with only one universe. We have to keep track of equivalences, using a specialized data structure. We recognize here the classical union-find problem, which have very efficient solutions.

However, in order to do this task, in Coq, a naive approach is used: it builds equivalence trees, but nothing is done to prevent these trees to be very unbalanced. And, in fact, that was the problem in my file: for some reason, my tactics generated hundred of universes constraints. All these universes were inferred to be equivalent. In order to keep track of this information, a long chain of universes were created: for each of them, Coq knows that it is equivalent to the next one, but it does not know how to go directly to the head of the chain, that is effectively present in the graph. So, it was actually spending most of its time following this long chain...

In the literature, there are two main heuristics to solve this (classical) problem: "union by rank" and "path compression". Using either one gives logarithmic amortized complexity, and they give a practically constant amortized complexity if implemented together. By experience, I know that each of these heuristics gives very good performances when used alone, because the worst case is very rare. Moreover, it is not easy to implement path compression in a persistent way, because it involves side effects. So, I decided to implement "union by rank" inside the Univ module of Coq.

It was not really difficult: I just added a rank field in the record representing the nodes of the graph, that contains the length of the longest path of equivalences leading to this node. So, when we are merging nodes, we can chose to keep the node having the longest chain as the head node. This keeps the graph balanced: if the other nodes were part of smaller chains, the longest chain's length doesn't increase.

I got a spectacular improvement of performances when Coq is executed on my code: from more than two hours, I went to about 6.5 minutes. I get a reasonable 1% improvement when compiling the Coq standard library: it is actually expected, because the standard library does not contain a lot of complex polymorphic code.

You can see the current patch (which may still evolve a bit) in this pull request. My first implementation was against 8.3, this patch is on the trunk version of Coq, and we're planning to port it to 8.4 soon. Do not hesitate to test it if you have universe-heavy developments; any feedback is welcome.