Hector

hector is an OCaml library that offers vectors (also known as dynamic arrays, or resizeable arrays).

Installation and Usage

Type opam install hector.

In your dune file, add (libraries hector) to the description of your library or executable.

What is a Vector?

A vector is a data structure that holds a sequence of values. A vector is a mutable data structure: it can be updated. Like an array, it offers efficient random access functions, get and set. Furthermore, a vector can grow and shrink; elements can be pushed or popped at its right end.

The logical length of a vector is the length of the sequence.

Furthermore, a vector has a physical capacity. A vector's capacity is at least equal to its length, and can be greater than its length. Almost always, the capacity of a vector is the length of the data array where the vector's elements are stored. As an exception to the previous sentence, if a vector has logical length 0 then its data array can be empty, even if its capacity is nonzero.

Three Flavors of Vectors

Three main implementations of vectors are offered:

  1. The module Hector.Poly offers polymorphic vectors: if elements have type 'a then a vector has type 'a vector. The signature of this module is Hector.POLYVECTOR.
  2. The functor Hector.Mono.Make offers monomorphic vectors: the type of elements is fixed when this functor is applied to a module E, and a vector has type vector. The module produced by this functor has signature Hector.MONOVECTOR with type element = E.t.
  3. The module Hector.Int offers integer vectors: the type of elements is int and a vector has type vector. The signature of this module is Hector.MONOVECTOR with type element = int. This module can be significantly faster than the equivalent module Hector.Mono.Make(Int).

For power users,

  1. The functor Hector.Mono.OutOfArray offers monomorphic vectors. The user provides not only the type of elements, but also a set of basic operations on arrays of elements.

Words of Caution

hector's vectors are not thread-safe and do not include a protection against data races: concurrent accesses to a vector, where at least one thread attempts to modify the vector, are forbidden and can compromise memory safety (that is, they can cause a hard crash or silent memory corruption). Concurrent read accesses are safe. OCaml's Dynarray library also offers non-thread-safe vectors, and guarantees that data races cannot compromise memory safety, whereas hector does not.

hector's vectors do not include a protection against memory leaks. If a vector's capacity is greater than its length, then the logically empty slots in its data array contain stale values, which in the eyes of the garbage collector are reachable. This problem can be avoided by explicitly calling reset or fit_capacity. In contrast, OCaml's Dynarray library does guarantee the absence of memory leaks.

hector's higher-order operations do not detect an attempt to modify a vector while an operation is in progress. For example, in iter f v, the function f must not modify the vector v, but a violation of this rule is not detected by hector. In contrast, OCaml's Dynarray library detects some (albeit not all) violations of this rule at runtime.

Comparison with OCaml's Dynamic Arrays

hector's vectors are similar to those offered by OCaml's Dynarray library.

Compared with hector, OCaml's Dynarray library offers stronger guarantees: see Words of Caution above.

hector's polymorphic vectors and monomorphic vectors are generally slightly faster than OCaml's dynamic arrays, and on some operations, can be up to 2x faster. hector's integer vectors are generally significantly faster than OCaml's dynamic arrays, and on some operations, can be up to 5x faster.

hector offers fast (but dangerous) unchecked random access operations, unsafe_get and unsafe_set, which OCaml's Dynarray library does not have.

Via unsafe_borrow, hector offers direct access to the data array, allowing direct read and write operations; OCaml's Dynarray library does not (and cannot) offer this feature.

hector's stable_sort is slightly faster than Array.stable_sort, and can be significantly faster than Array.stable_sort if the data is already sorted or almost sorted.

A few operations have different behavior in Dynarray and in hector:

  1. hector's compare behaves like List.compare, not like Dynarray.compare. Dynarray.compare implements a preorder on vectors that is not is the lexicographic preorder.
  2. hector allows append v v, which Dynarray forbids.
  3. hector allows applying get_last to an empty vector, which Dynarray forbids.

A few operations exist in Dynarray but not in hector:

  1. hector does not offer to_seq_reentrant and to_seq_rev_reentrant. This is intentional. The semantics of these operations is, in my opinion, unclear. A user who needs these operations can implement them outside hector.
  2. hector does not offer capacity. This is intentional. Because hector does not guarantee that a vector's capacity is equal to the length of its data array, it might be confusing to expose the function capacity.

Several operations exist in hector but not in Dynarray:

  1. sub,
  2. concat,
  3. fill,
  4. blit,
  5. sort, stable_sort, fast_sort,
  6. unsafe_get,
  7. unsafe_set,
  8. unsafe_borrow,
  9. push_array_segment,
  10. iter_down,
  11. find,
  12. show,
  13. the submodule Stack.