Gagallium : Reading papers on CPS, SSA, ANF and the likes

I have recently started an internship in the Gallium team working on intermediate representation (IR) for compilers. The aim is to get a better understanding of the relation that links “traditional” forms and their functional counterparts.

Many different representations exist, the ones I am focusing on are:

static single assignment (SSA): each variable is assigned to only once (appears once in the left-hand side of an assignment), this makes data-flow tracking easy,
continuation passing style (CPS): a variant of λ-calculus where functions do not return but apply a continuation to their result, and
administrative normal form (ANF): a variant of λ-calculus with flat (not-nested) let bindings.

There are other IRs (such as the monadic intermediate language) which I will not explore.

I started my internship by reading papers on the topic, and coding a tool to help in the exploration of the different IRs. The code is on my github (which is not bad for reasons of OSS code hosted on non OSS-powered servers but because it uses flash).

#An example with SSA and CPS

Here is the dummy pseudo-code example we will be using:

i = 0
j = 10
while j>0 do
  i = i+1
  if i%2 = 0
    j = j-i
  else
    j = j-1
done
return j

Translation into SSA requires the conversion to a control flow graph (CFG; here flattened using labels and jumps/branchs for edges) and placement of ϕ-functions. These ϕ-functions hack around the main SSA restriction: variables can not be assigned to twice. Thus ϕ-functions merge variables coming from different places.

i0 = 0
j0 = 10

label l_while_test
j1 = phi j0 j2 j3
i1 = phi i0 i2
t0 = gt j1 0
branch t0 l_while_body l_while_exit

label l_while_body
i2 = add i1 1
t1 = mod i2 2
t2 = eq t1 0
branch t2 l_if_true l_if_false

label l_if_true
j2 = sub j1 i2
jump l_while_test

label l_if_false
j3 = sub j1 1
jump l_while_test

label l_while_exit
return j1

The notations used in this examples use the following conventions:

assignment to a variable v is noted v = <operator> <operand1> <operand2>,
branching is done via branch <variable> <label1> <label2> and is equivalent to jump <label1> if the variable is true, and jump <label2> otherwise, and
variables that were introduced as a result of the single-assignment constraint preserve their original variable names as prefixes, others are prefixed with t.

The CPS version resemble SSA’s. The main difference is with the ϕ ~~hack~~ ingenious idea. The following table sums up the correspondence:

SSA	CPS
Procedure	λ
Block	λ
ϕ-node	Parameters for a block-λ
Jump/Branch/…	Application

Here is the translated examples. The λs are noted with a \. The names of the labels have been reused to make the correspondence easier to see. (Note that we do not use a complete CPS and allow the use of operators in direct style. On the other hand we kept CPS for if-then-else.)

\ i j k . (
  while_test i j k
  where while_test i j k = (
    if (j>0) (k i j) (while_body i j while_test)
    where while_body i j k = (
      \ i . (
        if (i%2 = 0) (if_true i j k) (if_false i j k)
        where if_true i j k = (k i (j - i))
        and   if_false i j k = (k i (j - 1))
      ) (i+1)
    )
    in
  )
  in
  )
) 0 10 (\ i j k . (k j))

This example might be more difficult to read for those not familiar with λ-calculus or CPS code. Points of interest are:

the ϕs at the while_test-labeled block are translated as parameters to the while_test-named function,
both jumps, returns, and branchs are translated as continuation application
each block is translated as a λ, and
the nesting of λs matches the dominator tree of the control-flow graph of the blocks.

The notion of dominator will not be formally defined here. Let’s just say that “a block b₁ dominates a block b₂ if every path leading to b₂ goes through b₁”. Details can be found in the literature.

#Papers

Here are comments on a few interesting papers related to the aforementioned topic. It is intended for those in need of more details, others might skip to the Code section.

##SSA is Functional Programming

by Appel, 1998, available here.

The article informally explains the links between SSA and functional programming in general. SSA has syntactic constraints that makes it nested-scope-like and thus easy to translate into a functional language. Jumps are mapped to function applications and ϕs to function parameters.

##Compiling Standard ML to Java Bytecodes

by Benton, Kennedy and Russell, 1998, available here.

The article explains the internals of MLj, a compiler from SML97 to Java bytecode. It uses a typed, functional intermediate representation called monadic intermediate language (MIL). Features include: SML-java interoperability (possible foreign function calls) and various optimisations (such as monomorphisation).

MIL is a typed functional language. The type system differentiate values and computations—the latter including effect information. Optimisations based on β and η reduction need a transformation called commuting conversions to avoid let-in nesting.

Towards the end, the compiler transforms (commuting-conversion normalised) MIL into a variant of SSA. This final representation is then transformed into real bytecode.

Note: sources for the compiler are available on the MLj page.

##Compiling with Continuations, Continued

by Kennedy, 2007, available here.

The paper advocates the use of CPS and argues that it is better than either ANF or MIL for use as an intermediate language in compilers. Optimisations apply better on CPS than on alternative in that they are well known λ-calculus transformation and they preserve the language (thus avoiding the need for a normalisation/simplification pass).

The CPS language is formally defined (well-formed-ness and semantics have derivation rules). The translation from a reasonable fragment of ML into CPS is detailed. The CPS language is then extended to include exceptions and recursive functions.

It is made clear that optimisations are all particular cases of β and η rewritings. This contrasts with MIL and ANF in that both require commuting conversion normalisation to be performed between optimisation passes—leading to O(n²) complexity on some optimisation/normalisation interleavings.

##Design and Implementation of Code Optimizations for a Type-Directed Compiler for Standard ML

by Tarditi, 1996, available here.

This PhD thesis explores different aspects of type-directed compilation (using the Typed Intermediate Language (TIL) framework. It uses an IR called LMLI (for λ_i^ML or lambda intermediate ML). The optimisation phase is performed on a subset of LMLI similar to ANF with type annotations.

The following arguments are presented concerning this choice of IR rather than CPS:

the two are theoretically equivalent (there exist an efficient translation from one form to the other),
direct-style (DS) (that is non-CPS)
1. makes it easier to re-use compilation techniques (e.g. from GHC), and
2. is more suited to the set of implemented optimisations, even though
CPS has a more powerful β inlining

Concerning point (2a), it is interesting to note that some compilers (e.g. SML/NJ and some Scheme compilers) use CPS.

Point (2b) is based on the idea that function calls and returns are difficult to distinguish in CPS. This is not necessarily true as some CPS languages have distinct lexical categories for these different types of continuations (e.g. in A Correspondence between Continuation Passing Style and Static Single Assignment Form by Kelsey, 1995)

##Optimizing Nested Loops Using Local CPS Conversion

by Reppy, 2000, available here.

Using a DS IR makes certain loop optimisations difficult (more specifically, nested loops can makes some calls appear as non-tail while they essentially are). Because this is specific to DS, it is proposed to transform appropriate parts of the program to CPS, hence the local nature of CPS conversion.

Detecting the parts that need conversion requires a traversal of the DS IR (and a fixed-point search to support mutually recursive functions) to build the environment associating program variables to the abstract continuations lattice: Γ : var → (label∪{⊤,⊥})

The LCPS conversion offers then the possibility to group functions into clusters. Clusters can then be compiled as would loops: with only one stack frame (however many functions there are) and jump operations. Thus nested loops as well as “mutually tail recursive” (used for state-machine encoding) can be optimised.

Just as pointed by Kennedy in “Compiling with Continuations, Continued”, defects of the DS IR are avoided introducing continuations.

##A Functional Perspective on SSA Optimisation Algorithms

by Chakravarty, Keller and Zadarnowski, 2003, available here.

Mapping the ideas behind the Sparse Conditional Constant Propagation (SCCP) algorithm (see Constant Propagation with Conditional Branches by Wegman and Zadeck, 1991) from SSA to ANF form. The ANF form, the algorithm, and associated proofs are exposed formally.

The new algorithm (SCC_ANF) uses:

an environment Γ mapping the set Var of variables onto the Abs = {⊤, ⊥} ∪ Const lattice—where ⊤ is for variables for which there is no information, and ⊥ is for variables that do not receive concrete values¹,
a working set for remembering the things that remains to be done

and is two fold:

Analysis (looking for the biggest fix-point):
- assign the ⊤ value to every var
- fold over the program modifying the environment and the working set
Specialisation (Using the Analysis’s results):
- fold over the program
  - removing dead-code (functions, not in Γ)
  - replacing the constants by their values
  - replacing non-terminating computations by an infinite loop—with some tricks for keeping side-effects.

#Code

In order to understand the relationship between the different IRs, I’ll try to port algorithms from SSA to CPS (and, if time constraints allow it, ANF). This translation is more interesting than its inverse because:

SSA algorithms are plenty and easy to find, and
λ-calculus has strong theoretical foundations.

Item (a) widens the scope of the study and item (b) makes results easier to interpret (or so I hope).

Because it is easy—once one has the dominator relation computed—to translate SSA into CPS the problem of porting an optimisation Optim_ssa is to close following diagram; that is, finding the Optim_cps that verifies, for any program P:

                          Transl
       P ----------------------------------> Transl(P)
       |                                        ¦
       |                                        ¦
   Optim_ssa                                Optim_cps?
       |                                        ¦
       |                                        ¦
       V                                   ?    V
  Optim_ssa(P) -----> Transl(Optim_ssa(P)) = Optim_cps(Transl(P))
               Transl

Of course the trivial solution Transl⁻¹(Optim_ssa(Transl(.))) is correct but useless as our goal is to understand the relation between SSA and CPS—finding Optim_cps is a mean of exploration. I do not expect every algorithm to be translatable in a pseudo-commuting manner. Pseudo-commutation might only be verified on some particular subset of SSA/CPS.

Additionally, I am writing a parser from LLVM human-readable assembly to my own SSA format so as to be able to use the optimization algorithms available in the LLVM project. My custom SSA form is close enough to LLVM’s. target architecture information, calling convention annotations, and other details are ignored, and some invariants are enforced in the type system.