Batteries Experiences

Gabriel Scherer

OCaml Paris Meeting, July 2nd

Introduction

This talk

Short description of the Batteries-included project.

Focus point : interesting internal techniques that other OCaml developpers could reuse.

The Batteries Project

What is Batteries?

Free software, Community-maintained set of basic libraries.

 

No code ownership (anyone can add and improve code).

 

Goal: solve the "no standard library" problem.

Birth

In end 2007 - early 2008, there was a spark of community activity around OCaml.

At OUM, the lack of extensive standard library is identified as the major issue by the OCaml community.

Infancy

Builds upon Extlib, a similar effort (Nicolas Cannasse et al., mostly active 2003-2007), provides a solid codebase, strictly extending the standard library.

Ambitious goals:

Releases in late 2008, early 2009.

Maturation

By the end of 2009, contributions had slowed down and Yoric's time-consuming job required a change in organization.

Lots of code included, uneven quality, too large to maintain.

Eric "thelema" Norige sheds a lot of weight of Batteries, and development starts again on a restricted codebase (AAA-batteries). Release 1.0 in January 2010.

Strict backward compatibility requirements (semantic versioning).

Since then, Batteries has seen reasonable code growth, a few releases a year, and more pragmatic goals -- no world-changing syntax extensions.

Present days

Commits since last release (non-injective naming).

30  Eric Norige
17  Cedric Cellier
14  Gabriel Scherer
13  Francois Berenger
12  Rudi Grinberg
 7  Seung Cheol Jung
 5  Christopher Zimmermann
 3  Gabriel Kerneis
 3  Vincent Hugot
 2  Kaustuv Chaudhuri
 1  Glyn Webster
 1  Kensuke Matsuzaki
 1  wistery-k

Next release should be soon.

Some experiences

Large libraries run into tool and language problems

Batteries was held back by lack of unified package management in the OCaml community. Back then ocamlfind was not a given. Syntax extensions found too hard to deploy.

 

Parametrized types and module signatures don't always mix very well. What's the common interface to Array and Map as associative containers?

Hard to maintain consistency

Starting from the compiler libraries and Extlib, it is difficult to present a coherent whole.

 

Jane Street Core probably did a better job there, by breaking backward compatibility.

Future-proof specification of functions is difficult

Saying "evaluation order is unspecified" doesn't work in practice.

 

Users assume implementation is spec.

 

You must make the right choice in one try.

 

{Unit,Random} Testing is essential to understand specification questions. What are the edge cases?

Managing contributions is easy

The only useful item in "Developers Guidelines"

Writing tests is mandatory. If you add or modify a Batteries feature, your patch must come with tests on the affected functions. We will not accept patches that don't come with the relevant tests. If you are doing performance optimizations, the patch must come with benchmarks measuring the performance changes.

 

Then just review patches.

Concrete tidbits

Magic tail-recursive map

type 'a mut_list =
  { hd: 'a; mutable tl: 'a list }

(* beware write barrier costs *)
let map f = function
  | [] -> []
  | h :: t ->
    let rec loop dst = function
      | [] -> ()
      | h :: t -> loop (setcdr dst (f h)) t
    in
    let r = { hd = f h; tl = [] } in
    loop r t; inj r

In-line unit testing

let rsplit str ~by:sep =
  let p = rfind str sep in
  let len = length sep in
  let slen = length str in
  sub str 0 p, sub str (p + len) (slen - p - len)
(*$T rsplit
   rsplit "aGxG1" ~by:"G" = ("aGx","1")
   rsplit "aGHxGH1" ~by:"GH" = ("aGHx", "1")
   rsplit "aGxG1" ~by:"" = (""aGxG1", "")
   try rsplit "az" ~by:"G" |> ignore; false \
       with Not_found -> true
*)

In-line unit testing (2)

https://github.com/vincent-hugot/iTeML

# extract all qtest unit tests into a single ml file
$(QTESTDIR)/all_tests.ml: $(TESTABLE)
    qtest -o $@ --shuffle --preamble-file qtest/qtest_preamble.ml \
      extract $(TESTABLE)

_build/$(QTESTDIR)/all_tests.native: $(QTESTDIR)/all_tests.ml
    $(OCAMLBUILD) $(OCAMLBUILDFLAGS) -cflags -warn-error,+26 \
      -use-ocamlfind -pkg oUnit,QTest2Lib $(QTESTDIR)/all_tests.native

qtest: prefilter qtest-clean
    @_build/$(QTESTDIR)/all_tests.$(EXT)

Functorized testing with OUnit

Same test-base tests all Map interfaces.

  let test_choose () =
    "choose empty -> Not_found" @!
      (Not_found, fun () -> M.choose M.empty);
    let t = il [(1,2); (3,4)] in
    "mem (fst (choose t)) t" @?
      (M.mem (M.choose t |> fst) t);
    ()

Small benchmark lib to haxe performance discussions

  external primitive_int_compare : int -> int -> int 
    = "caml_int_compare" "noalloc"

  let naive_compare x y =
    (* this code has been used as BatInt.compare *)
    if x > y then 1
    else if y > x then -1
    else 0 in

  let mfp_compare (x : int) y =
    if x > y then 1
    else if y > x then -1
    else 0 in

  let samples = Bench.bench_n
    [
      "BatInt.compare", test BatInt.compare;
      "stdlib's compare", test Pervasives.compare;
      "external compare", test primitive_int_compare;
      "mfp's compare", test mfp_compare;
      "naive compare", test naive_compare;
    ]

Dumb prefilter.ml script for version hacks

##V3## let ( |> ) x f = f x
##V4## external (|>) : 'a -> ('a -> 'b) -> 'b = "%revapply"

##V3## let ( @@ ) f x = f x
##V4## external ( @@ ) : ('a -> 'b) -> 'a -> 'b = "%apply"

Code-sharing functorized and non-functorized maps

module Concrete = struct
  type ('k, 'v) map = Empty | Node of ...
  let rec min_binding = function ...
  let modify x f cmp map = ...
end

module Make(Ord : OrderedType) =
struct ...
  let min_binding t = Concrete.min_binding t
  let modify x f m = Concrete.modify x f Ord.compare m
end

module PMap = struct (*$< PMap *) ...
  let modify x f m =
    { m with map = Concrete.modify x f m.cmp m.map }
  let min_binding t = Concrete.min_binding t.map
end

Conclusion

Future

The co-existence with Core is embarrassing. Keeping compatibility with the stdlib was a good choice given community practices at Batteries' birth.
Will RWO change that?

 

My personal pet peeve: Module Classes.

Keep maintaining code and being useful to users. (N.B.: there are some users, we have no idea how many).

Your contribution?

Thanks

(Aaron Gallagher, Alp Mestan, Anders Lau Olsen, Andreas Bogk, Anton Novikov, Ashish Agarwal, ben kuin, Cedric Cellier, Christopher Zimmermann, Daniel Gregoire, David Teller, Dawid Toton, Dmitry Grebeniuk, Edward J. Schwartz, Erick Tryzelaar, Eric Norige, Erkki Seppälä, Francois Berenger, Gabriel, Gabriel Kerneis, Gabriel Scherer, Geoff Hulette, Glyn Webster, Hezekiah M. Carty, jathd, Jérémie Dimino, Justus Matthiesen, Kaustuv Chaudhuri, Kensuke Matsuzaki, Martin Jambon, Mauricio Fernandez, Max Mouratov, Mehdi Dogguy, Michael Ekstrand, Michael Lin, Moncef Baazet, Oleg Tsarev, Paolo Donadeo, Paul Pelzl, Pedro Borges, Peng Zang, Philippe Veber, Roman Sokolov, Rudi Grinberg, Sebastien Mondet, Sergey Plaksin, Serge Ziryukin, Seung Cheol Jung, Simon Castellan, Stefano Zacchiroli, Stephane Glondu, Sylvain Le Gall, Thibault Suzanne, Tiphaine Turpin, Valentin Gatien-Baron, Victor Nicollet, Vincent Hugot, Vladimir Ivanov, Warren Harris, wistery-k, ygrek, ...) as 'a