Current (and future?) state of OCamlbuild parallelization
- May 18, 2014
A few days ago, Hugo Heuzard asked about the details of the problem with OCamlbuild’s parallelism today on mantis: PR#5754. My mantis comment quickly grew into the blog post below.
I worked last September on better parallelization for OCamlbuild, but didn’t finish the thing (I couldn’t get something mergeable in time for ICFP) and got distracted with tons on other things to do since. I just uploaded the state of my branch:
https://github.com/gasche/ocaml/tree/ocamlbuild-indirect-style
Why ocamlbuild’s parallelization is often disappointing today
Below is an explanation of my understanding of the current problem with ocamlbuild’s parallelization. OCamlbuild central form of abstraction is the “rule action”, an OCaml function that knows how to build a target. In fact, the action
- is passed a “building function” to invoke whenever it needs to build a dependency of the target
- returns exactly one command (think of a command-line invocation) that will finally produce the target
For example, the rule to produce a .cmo
from a
.ml
file whose .mli
file is also present is
defined here in ocaml_specific.ml
:
"ocaml: ml & cmi -> cmo"
rule "%.cmo" ~deps:["%.mli"; "%.ml"; "%.ml.depends"; "%.cmi"]
~prod:"%.ml" "%.cmo");; (Ocaml_compiler.byte_compile_ocaml_implem
This is a library call that registers the rule action along with some metadata about it (what we know, statically, about its targets and dependencies; this is used by the ocamlbuild “solver” to understand when to invoke the rule).
The action itself is
Ocaml_compiler.byte_compile_ocaml_implem
. It reads as
follows (with added comments):
(* to build foo.ml *)
let byte_compile_ocaml_implem ?tag ml cmo env build =
let ml = env ml and cmo = env cmo in
(* builds the dependencies in foo.ml.depends *)
prepare_compile build ml;
(* then produce the actual "ocamlc -c ..." command-line *)
"implem"+++tag) ml cmo ocamlc_c (Tags.union (tags_of_pathname ml) (tags_of_pathname cmo)++
ml
, cmo
are explicitly passed to the
function in ocaml_specific.ml
. The action proper is the
function fun env build -> ...
: it takes an environment
and a builder function, and returns a command. This type of actions can
be found in signatures.mli
:
(** This is the type for rule actions. An action receive as argument, the
environment lookup function (see {!env}), and a function to dynamically
build more targets (see {!builder}). An action should return the command
to run in order to build the rule productions using the rule dependencies.
*)
type action = env -> builder -> Command.t
env
allows to translate from the static patterns
%.ml
, %.ml.depends
, etc. to concrete values
foo.ml
, foo.ml.depends
depending on which
target matching the patterns is actually requested. The important part
is in builder -> Command.t
.
The prepare_compile
function, when invoked during the
action for foo.ml -> foo.cmo
, will look at
foo.ml.depends
(which is marked as a static dependency of
the rule, and hence has already been build when the action is invoked)
and build all the modules it requests:
let prepare_compile build ml =
let dir = Pathname.dirname ml in
let include_dirs = Pathname.include_dirs_of dir in
(* reads the module names from foo.ml.depends *)
let modules = path_dependencies_of ml in
(* invoke the `build` function to build them all (in parallel) *)
let results =
List.map (fun (_, x) -> expand_module include_dirs x ["cmi"]) modules) in
build (
(* inspects the build results to sometimes complain; returns () *)
List.iter2 begin fun (mandatory, name) res ->
match mandatory, res with
| _, Good _ -> ()exn ->
| `mandatory, Bad if !Options.ignore_auto then
3 "Warning: Failed to build the module \
dprintf %s requested by ocamldep" name
else raise exn
| `just_try, Bad _ -> ()end modules results
After this function is executed, all dependencies of the rules (the
static ones, that were known in advance, and the dynamic ones, which are
whichever was passed to build
and actually succeeded to
build) are present, and we only needs to return the final command:
let ocamlc_c tags arg out =
let tags = tags++"ocaml"++"byte" in
"-c"; T(tags++"compile");
Cmd (S [!Options.ocamlc; A"-o"; Px out; P arg]) ocaml_ppflags tags; ocaml_include_flags arg; A
(Note that the command-line has some tags
, some built-in
passed here and some depending on the pathname (that is, looked up in
_tags
), passed in the
byte_compile_ocaml_implem
function), which will get
translated into actual command-line options by the flags
declaration (built-in or in the user’s myocamlbuild.ml
).
That’s orthogonal to the current topic.)
The opportunities for parallelization come from the fact that the
build
rule takes a list of targets to build (in fact it
takes a list of lists: with input [[A;Abis]; [B;Bbis]]
it
will try to build (either A
, or Abis
if it
fails) and-in-parallel (either B
, or Bbis
if
it fails)). If you ask ocamlbuild to build bar.cmo
,
baz.cmo
and foobar.cmo
, it will:
- call the relevant rule for each, to get a command-line for each subtarget
- call all those command-lines in parallel (this is the
ocamlbuild_executor
business)
Any dynamic dependency (call to build
) executing
before returning the final command will not be parallelized,
only the final production step will.
In practice, this works reasonably well for rules that don’t take a
lot of time to return a command, but whose command runs long enough (eg.
.cmx
production when dependency chains are flat enough).
But it doesn’t work very well when a good share of the work, for each
subtarget, is performed before returning that final
command-line to execute.
My draft of a solution
Consider the following scenario: the target I’m currently building
has dependencies A
and B
, and:
A
’s action callsbuild
on[A1; A2]
, and then returns the commandcomA
B
’ action callsbuild
on[B1; B2]
, and then returns the commandcomB
Today [A1; A2]
are run in parallel, then
[B1; B2]
are run in parallel (or maybe B’s before A’s),
finally and [comA; comB]
are executed in parallel. What we
would rather like is to have [A1; A2; B1; B2]
to be built
in parallel, then [comA; comB]
in parallel.
We can’t do that with the current interface because actions are just
functions from build
to Command.t
: they do not
give to their caller any information about the dynamic dependencies they
may decide to build. Instead of just being a function that takes a
build
and returns a Command.t
, we need an
action to produce a data-type that says either:
- I’m done, here is the final
Command.t
to produce my build - Not yet, I want you to build
those
dependencies, and then resume my build with this next action (“continuation”)
As the caller of both A
and B
in parallel,
I would get two “Not Yet” answers, with a request to build
[A1;A2]
on one side and [B1;B2]
on the other;
I can build them all in parallel, and continue with the two
continuations in parallel.
That means that instead of writing rules in direct style (calling
build
to produce dependencies), actions should
revert their control-flow to pass their dependencies to their
caller. You probably have heard the music before: this screams of
monads.
What I started doing in the aptly named ocamlbuild-indirect-style is exactly this: define a type of actions that is data instead of a blackbox function, use it for all built-in rule descriptions, and implement an “execution” function that evaluates such a rule to actually produce a result.
The nice thing with this change is that it can be done in a very
incremental way: it’s easy to still expose the old interface where
rule
expects a function (as action), and on-the-side expose
the new interface with a indirect_rule
function whose
action is the new action description format. You can then let users
convert their rules if they want to, at their own leisure (it’s even
possible to mix both styles in the same action description); just
converting the builtin OCaml-specific rules to the indirect style should
already give a nice parallelization boost.
If I remember correctly, when I stopped working on that (expecting to resume soon, of course), the missing piece was a good “evaluation” function that would take an indirection action and execute it correctly, exploiting the extra parallelism information. The naive way to do it doesn’t quite work, and you have to be careful about sharing work when two rules end up requiring the same subtarget at different times in the build – building an explicit task-dependency graph could be a good way to do it properly.
In the branch, the code I’ve shown previously looks as follows
(ocamlc_c
is strictly unchanged):
let prepare_compile ml =
let dir = Pathname.dirname ml in
let include_dirs = Pathname.include_dirs_of dir in
let modules = path_dependencies_of ml in
let module_build_order (mandatory, name) =
"cmi"],
(expand_module include_dirs name [fun result -> match mandatory, result with
| _, Good _ -> ()exn ->
| `mandatory, Bad if !Options.ignore_auto then
3 "Warning: Failed to build the module \
dprintf %s requested by ocamldep" name
else raise exn
| `just_try, Bad _ -> ())in
List.map module_build_order modules)
build_order (
let byte_compile_ocaml_implem ?tag ml cmo env =
let ml = env ml and cmo = env cmo in
fun () ->
bind (prepare_compile ml) & let tags =
Tags.union (tags_of_pathname ml) (tags_of_pathname cmo)"implem"+++tag in
++ return (ocamlc_c tags ml cmo)
build_order
is one of the constructors of those new
indirect-style actions; it takes a pair of the dependency that has to be
built (here expand_module ...
), and a cleanup/validation
function to be executed on the result (same checking stuff as before).
The byte_compile_ocaml_implem
function uses a monadic
bind
instead of ;
, but that’s about the only
change.
Remark
If you heard about Jenga, you may have noticed the remarkable similarity between the monadic version I describe here and Jenga’s rule design (or maybe you know about Shake and also found a strong link). I noticed myself when I first saw a talk about Jenga… a few months after doing this work. The indirect-style rules are only a small (and extremely natural) improvement over OCamlbuild’s current state, which is in fact quite close to Jenga’s design to start with.
In particular, one thing I am curious about is whether it would be possible to reuse Jenga’s runtime (the part that gets the dependency graph and runs it as fast as possible or does clever thing with cache of previous runs and filesystem notifiers). But this experiment would be more invasive than the work strictly necessary to just improve parallelization on OCamlbuild (which has no reason to be particularly slow on its own), so it’s not a priority.
Remark 2
If you’re interested in contributing to OCamlbuild, there is certainly interesting work to be done in the area of parallelization. My gut feeling, however, is that other issues may be more pressing (and less time-consuming), such as satisfying reasonable feature requests, or helping improve the documentation.