XStream: benchmarks

Specification

This pages shows some performance results of XStream compared to other technologies for XML transformation.

We choose a specific transformation task which has been previously used as a benchmark between CDuce and XSLT. The XStream code is:

(* Get elements with a specific tag *)
declare extract(string,_,_)
extract(t, %s[@a x] y,q) when <<s=t>> -> %s[@a x] extract(t,y,q)
extract(t,_[_] y,q) -> extract(t,y,q)
extract(_,(),q) -> q

split(person[@a name[str(n,_)] children[c] _] y,q) ->
  let tag = << match List.assoc "gender" a with "M" -> "man" | _ ->"woman" >> in
  let a' = << ["name",n] >> in   (* The new attribute *)
  let c = split(c,()) in         (* Transform recursively *)
  let s = extract("man",c,()) in (* Split according to gender *)
  let d = extract("woman",c,()) in
  %tag[ @a' sons[s] daughters[d] ] split(y,q)
split(str(_,y),q) -> split(y,q)
split((),q) -> q

main(doc[x] _) -> doc[split(x,())] ()

The input document is assumed to store a database of the descendants of some persons. A person is described as an XML element:

<person gender="...">
  <name>...</name>
  <children>...</children>
</person>

The gender attribute is either "F" or "M" according to whether the person is a woman or a man. The <name> sub-element contains a single string. The <children> sub-element contains one <person> element for each child of the person. The task is to transform such an element into:

<woman name="...">
  <sons>...</sons>
  <daughters>...</daughters>
</woman>

or into <man>...</man> according to the gender of the person. The name is moved from a sub-element into an attribute; the gender information is moved from an attribute into a tag name; the children are split according to their gender and transformed recursively.

Protocol

We wrote the same transformation in the CDuce, XSLT, XQuery and STX languages. For these four languages, we tried several implementation variants and picked the most efficient one for each tool. All these implementations can be found here.

We used four well established XSLT engines: Xalan C++ version 1.10, the XSLTC compiler version 1.4 from the Xalan-Java project (an XSLT-to-Java bytecode compiler), Saxon version 8.7.3, and the xsltproc tool built above the Gnome project's libxml/libxslt libraries.

We used two implementations of XQuery which are reputed to be efficient: Qizx/open version 1.1 and the XQuery engine from Saxon version 8.7.3.

We used Joost version 2006-05-05 which is the most efficient STX implementation available.

We generated random XML documents of various size. They consist of a long sequence of toplevel <person> elements, each one containing a number of descendant (the maximum depth is 6). For each implementation and each input document, we ran the transformation five times, measured the wall-clock time and took the geometrical mean over the three executions. We excluded compilation time for XStream, CDuce and XSLTC (the other implementations are interpreters). The machine used for this benchmark is a Pentium 4 2.80Ghz with 1Gb of RAM. The Java Virtual Machine (used by Qizx/open, Saxon, Xalan) is Sun's J2RE version 1.5.0 with maximum Java heap size set to 512Mb. Saxon was run with the -pull option which gave some noticeable speedup.

XStream's behavior

Of course, XStream is able to process the toplevel <person> elements one by one in a streaming fashion. But it does more: within such a toplevel element, the male children are also processed on the fly, and so are their own male children, and so on.

Results

The throughput results are given in the table below, in Mb per second (higher is better) with respect to the size of the input. A star indicates that the process was killed either by the OS or by the JVM.

Tool Script 1Mb 2Mb 5Mb 10Mb 20Mb 40Mb 80Mb 160Mb 320Mb
XStream split4.xst 5.17 5.30 6.12 6.01 6.25 6.47 7.24 7.23 7.25
CDuce split2.cd 3.38 3.45 4.43 4.24 4.06 3.74 3.35 2.61 *
Saxon (XQuery) split2.saxon.xq 0.31 0.63 1.03 1.25 1.41 1.51 1.58 * *
Qizx/open (XQuery) split3.qizx.xq 0.51 0.76 1.21 1.30 1.43 1.50 1.53 * *
XSLTC (XSLT) split3.xsltc.xsl 0.85 1.41 2.48 3.05 3.41 3.18 2.72 0.88 *
Saxon (XSLT) split3.saxon.xsl 0.42 0.83 1.53 1.94 2.29 2.55 2.76 * *
xsltproc (XSLT) split3.xsltproc.xsl 2.13 2.54 3.43 2.47 1.59 1.04 0.60 * *
Xalan (XSLT) split3.xalan.xsl 1.07 1.23 1.43 1.19 1.08 0.68 0.46 0.24 *
Joost (STX) split.stx 0.35 0.39 0.51 0.53 0.55 0.54 0.57 0.56 0.60

We also measured the maximum memory size used by each implementation. To do this, we fetched every second the VmRSS field (resident set size) from the pseudo file /proc/pid/status (this is the same information as returned by the top command). We ran this benchmark separately from the time measurement in order not to disturb it.

input size: 10Mb 20Mb 40Mb 80Mb
XStream 1.0 1.0 1.0 1.0
CDuce 38.7 69.5 165.8 348.6
Saxon (XQuery) 76.8 134.1 258.8 499.8
Qizx/open (XQuery) 65.8 112.4 205.9 398.7
XSLTC (XSLT) 57.3 102.3 215.0 440.8
Saxon (XSLT) 79.9 134.7 249.3 496.9
xsltproc (XSLT) 109.5 216.3 430.6 852.8
Xalan (XSLT) 42.1 77.9 149.4 290.2
Joost (STX) 16.7 16.7 16.7 16.7

Scripts

We ran various versions of the transformation in order to pick the best implementation for each tool. The results are given below (througput in Mb/s, for a 20Mb input document). For XStream, we choose the most idiomatic version, not the most efficient one.

Thanks to Michael Kay for his help in improving the XSLT versions.

The first STX version was written by Keisuke Nakano. It handles only input documents with a fixed nesting depth (currently 6) and performs the same level of streaming as XStream. The second version works for any nesting depth, but is much slower and streams only toplevel elements.

ToolScript Throughput (Mb/s)
XStream split.xst 6.52
XStream split2.xst 7.09
XStream split3.xst 7.06
XStream split4.xst 7.02
CDuce split.cd 2.87
CDuce split2.cd 3.99
CDuce split3.cd 2.82
Saxon (XQuery) split.saxon.xq 1.37
Saxon (XQuery) split2.saxon.xq 1.41
Saxon (XQuery) split3.saxon.xq 1.00
Qizx/open (XQuery) split.qizx.xq 1.31
Qizx/open (XQuery) split2.qizx.xq 1.37
Qizx/open (XQuery) split3.qizx.xq 1.41
XSLTC (XSLT) split1.xsltc.xsl 3.27
XSLTC (XSLT) split2.xsltc.xsl 3.10
XSLTC (XSLT) split3.xsltc.xsl 3.38
Saxon (XSLT) split1.saxon.xsl 1.95
Saxon (XSLT) split2.saxon.xsl 1.77
Saxon (XSLT) split3.saxon.xsl 2.28
xsltproc split1.xsltproc.xsl 1.12
xsltproc split2.xsltproc.xsl 1.00
xsltproc split3.xsltproc.xsl 1.59
Xalan (XSLT) split1.xalan.xsl 1.07
Xalan (XSLT) split2.xalan.xsl 0.91
Xalan (XSLT) split3.xalan.xsl 1.15
Joost (STX) split.stx 0.43
Joost (STX) split2.stx 0.29