## Reading papers on CPS, SSA, ANF and the likes

- June 18, 2012

I have recently started an internship in the Gallium team working on intermediate representation (IR) for compilers. The aim is to get a better understanding of the relation that links “traditional” forms and their functional counterparts.

Many different representations exist, the ones I am focusing on are:

- static single assignment (SSA): each variable is assigned to only once (appears once in the left-hand side of an assignment), this makes data-flow tracking easy,
- continuation passing style (CPS): a variant of
*λ*-calculus where functions do not return but apply a continuation to their result, and - administrative normal form (ANF): a variant of
*λ*-calculus with flat (not-nested)`let`

bindings.

There are other IRs (such as the *monadic intermediate language*) which I will not explore.

I started my internship by reading papers on the topic, and coding a tool to help in the exploration of the different IRs. The code is on my github (which is not bad for reasons of OSS code hosted on non OSS-powered servers but because it uses flash).

#An example with SSA and CPS

Here is the dummy pseudo-code example we will be using:

```
i = 0
j = 10
while j>0 do
i = i+1
if i%2 = 0
j = j-i
else
j = j-1
done
return j
```

Translation into SSA requires the conversion to a control flow graph (CFG; here flattened using `label`

s and `jump`

s/`branch`

s for edges) and placement of *ϕ*-functions. These *ϕ*-functions hack around the main SSA restriction: variables can not be assigned to twice. Thus *ϕ*-functions merge variables coming from different places.

```
i0 = 0
j0 = 10
label l_while_test
j1 = phi j0 j2 j3
i1 = phi i0 i2
t0 = gt j1 0
branch t0 l_while_body l_while_exit
label l_while_body
i2 = add i1 1
t1 = mod i2 2
t2 = eq t1 0
branch t2 l_if_true l_if_false
label l_if_true
j2 = sub j1 i2
jump l_while_test
label l_if_false
j3 = sub j1 1
jump l_while_test
label l_while_exit
return j1
```

The notations used in this examples use the following conventions:

- assignment to a variable
`v`

is noted`v = <operator> <operand1> <operand2>`

, - branching is done via
`branch <variable> <label1> <label2>`

and is equivalent to`jump <label1>`

if the variable is`true`

, and`jump <label2>`

otherwise, and - variables that were introduced as a result of the single-assignment constraint preserve their original variable names as prefixes, others are prefixed with
`t`

.

The CPS version resemble SSA’s. The main difference is with the *ϕ* ~~hack~~ ingenious idea. The following table sums up the correspondence:

SSA | CPS |
---|---|

Procedure | λ |

Block | λ |

ϕ-node |
Parameters for a block-λ |

Jump/Branch/… | Application |

Here is the translated examples. The *λ*s are noted with a `\`

. The names of the labels have been reused to make the correspondence easier to see. (Note that we do not use a complete CPS and allow the use of operators in direct style. On the other hand we kept CPS for `if-then-else`

.)

```
\ i j k . (
while_test i j k
where while_test i j k = (
if (j>0) (k i j) (while_body i j while_test)
where while_body i j k = (
\ i . (
if (i%2 = 0) (if_true i j k) (if_false i j k)
where if_true i j k = (k i (j - i))
and if_false i j k = (k i (j - 1))
) (i+1)
)
in
)
in
)
) 0 10 (\ i j k . (k j))
```

This example might be more difficult to read for those not familiar with *λ*-calculus or CPS code. Points of interest are:

- the
*ϕ*s at the`while_test`

-labeled block are translated as parameters to the`while_test`

-named function, - both
`jump`

s,`return`

s, and`branch`

s are translated as continuation application - each block is translated as a
*λ*, and - the nesting of
*λ*s matches the*dominator*tree of the control-flow graph of the blocks.

The notion of dominator will not be formally defined here. Let’s just say that “a block *b*_{1} dominates a block *b*_{2} if every path leading to *b*_{2} goes through *b*_{1}”. Details can be found in the literature.

#Papers

Here are comments on a few interesting papers related to the aforementioned topic. It is intended for those in need of more details, others might skip to the Code section.

##SSA is Functional Programming

by Appel, 1998, available here.

The article informally explains the links between SSA and functional programming in general. SSA has syntactic constraints that makes it nested-scope-like and thus easy to translate into a functional language. Jumps are mapped to function applications and *ϕ*s to function parameters.

##Compiling Standard ML to Java Bytecodes

by Benton, Kennedy and Russell, 1998, available here.

The article explains the internals of MLj, a compiler from SML97 to Java bytecode. It uses a typed, functional intermediate representation called monadic intermediate language (MIL). Features include: SML-java interoperability (possible foreign function calls) and various optimisations (such as monomorphisation).

MIL is a typed functional language. The type system differentiate values and computations—the latter including effect information. Optimisations based on *β* and *η* reduction need a transformation called *commuting conversions* to avoid `let`

-`in`

nesting.

Towards the end, the compiler transforms (commuting-conversion normalised) MIL into a variant of SSA. This final representation is then transformed into real bytecode.

Note: sources for the compiler are available on the MLj page.

##Compiling with Continuations, Continued

by Kennedy, 2007, available here.

The paper advocates the use of CPS and argues that it is better than either ANF or MIL for use as an intermediate language in compilers. Optimisations apply better on CPS than on alternative in that they are well known *λ*-calculus transformation and they preserve the language (thus avoiding the need for a normalisation/simplification pass).

The CPS language is formally defined (well-formed-ness and semantics have derivation rules). The translation from a reasonable fragment of ML into CPS is detailed. The CPS language is then extended to include exceptions and recursive functions.

It is made clear that optimisations are all particular cases of *β* and *η* rewritings. This contrasts with MIL and ANF in that both require commuting conversion normalisation to be performed between optimisation passes—leading to *O*(*n*^{2}) complexity on some optimisation/normalisation interleavings.

##Design and Implementation of Code Optimizations for a Type-Directed Compiler for Standard ML

by Tarditi, 1996, available here.

This PhD thesis explores different aspects of type-directed compilation (using the Typed Intermediate Language (TIL) framework. It uses an IR called LMLI (for *λ*_{i}^{ML} or lambda intermediate ML). The optimisation phase is performed on a subset of LMLI similar to ANF with type annotations.

The following arguments are presented concerning this choice of IR rather than CPS:

- the two are theoretically equivalent (there exist an efficient translation from one form to the other),
- direct-style (DS) (that is non-CPS)
- makes it easier to re-use compilation techniques (e.g. from GHC), and
- is more suited to the set of implemented optimisations, even though

- CPS has a more powerful
*β*inlining

Concerning point (2a), it is interesting to note that some compilers ( e.g. SML/NJ and some Scheme compilers) use CPS.

Point (2b) is based on the idea that function calls and returns are difficult to distinguish in CPS. This is not necessarily true as some CPS languages have distinct lexical categories for these different types of continuations (e.g. in A Correspondence between Continuation Passing Style and Static Single Assignment Form by Kelsey, 1995)

##Optimizing Nested Loops Using Local CPS Conversion

by Reppy, 2000, available here.

Using a DS IR makes certain loop optimisations difficult (more specifically, nested loops can makes some calls appear as non-tail while they essentially are). Because this is specific to DS, it is proposed to transform appropriate parts of the program to CPS, hence the *local* nature of CPS conversion.

Detecting the parts that need conversion requires a traversal of the DS IR (and a fixed-point search to support mutually recursive functions) to build the environment associating program variables to the abstract continuations lattice: *Γ* : *v**a**r* → (*l**a**b**e**l*∪{⊤,⊥})

The LCPS conversion offers then the possibility to group functions into *clusters*. Clusters can then be compiled as would loops: with only one stack frame (however many functions there are) and jump operations. Thus nested loops as well as “mutually tail recursive” (used for state-machine encoding) can be optimised.

Just as pointed by Kennedy in “Compiling with Continuations, Continued”, defects of the DS IR are avoided introducing continuations.

##A Functional Perspective on SSA Optimisation Algorithms

by Chakravarty, Keller and Zadarnowski, 2003, available here.

Mapping the ideas behind the Sparse Conditional Constant Propagation (SCCP) algorithm (see Constant Propagation with Conditional Branches by Wegman and Zadeck, 1991) from SSA to ANF form. The ANF form, the algorithm, and associated proofs are exposed formally.

The new algorithm (SCC_{ANF}) uses:

- an environment
*Γ*mapping the set*V**a**r*of variables onto the*A**b**s*= {⊤, ⊥} ∪*C**o**n**s**t*lattice—where ⊤ is for variables for which there is no information, and ⊥ is for variables that do not receive concrete values^{1}, - a working set for remembering the things that remains to be done

and is two fold:

- Analysis (looking for the biggest fix-point):
- assign the ⊤ value to every
*v**a**r* - fold over the program modifying the environment and the working set

- assign the ⊤ value to every
- Specialisation (Using the Analysis’s results):
- fold over the program
- removing dead-code (functions, not in
*Γ*) - replacing the constants by their values
- replacing non-terminating computations by an infinite loop—with some tricks for keeping side-effects.

- removing dead-code (functions, not in

- fold over the program

#Code

In order to understand the relationship between the different IRs, I’ll try to port algorithms from SSA to CPS (and, if time constraints allow it, ANF). This translation is more interesting than its inverse because:

- SSA algorithms are plenty and easy to find, and
*λ*-calculus has strong theoretical foundations.

Item (a) widens the scope of the study and item (b) makes results easier to interpret (or so I hope).

Because it is easy—once one has the dominator relation computed—to translate SSA into CPS the problem of porting an optimisation *O**p**t**i**m*_{ssa} is to close following diagram; that is, finding the *O**p**t**i**m*_{cps} that verifies, for any program *P*:

```
Transl
P ----------------------------------> Transl(P)
| ¦
| ¦
Optim_ssa Optim_cps?
| ¦
| ¦
V ? V
Optim_ssa(P) -----> Transl(Optim_ssa(P)) = Optim_cps(Transl(P))
Transl
```

Of course the trivial solution *T**r**a**n**s**l*^{ − 1}(*O**p**t**i**m*_{ssa}(*T**r**a**n**s**l*(.))) is correct but useless as our goal is to understand the relation between SSA and CPS—finding *O**p**t**i**m*_{cps} is a mean of exploration. I do not expect every algorithm to be translatable in a pseudo-commuting manner. Pseudo-commutation might only be verified on some particular subset of SSA/CPS.

Additionally, I am writing a parser from LLVM human-readable assembly to my own SSA format so as to be able to use the optimization algorithms available in the LLVM project. My custom SSA form is close enough to LLVM’s. target architecture information, calling convention annotations, and other details are ignored, and some invariants are enforced in the type system.

Absence of concrete value can be either due to code being dead or as a result of non-terminating computations.↩