Core ML

We first present a few examples, insisting on the functional aspect of the language. Then, we formalize an extremely small subset of the language, which, surprisingly, contains in itself the essence of ML. Last, we show how to derive other constructs remaining in core ML whenever possible, or making small extensions when necessary.

1.1 Discovering Core ML

Core ML is a small functional language. This means that functions are taken seriously, eg. they can be passed as arguments to other functions or returned as results. We also say that functions are first-class values.

In principle, the notion of a function relates as closely as possible to the one that can be found in mathematics. However, there are also important differences, because objects manipulated by programs are always countable (and finite in practice). In fact, core ML is based on the lambda-calculus, which has been invented by Church to model computation.

Syntactically, expressions of the lambda-calculus (written with letter a) are of three possible forms: variables x, which are given as elements of a countable set, functions λ x. a, or applications a₁ a₂. In addition, core ML has a distinguished construction let x = a₁ in a₂ used to bind an expression a₁ to a variable x within an expression a₂ (this construction is also used to introduce polymorphism, as we will see below). Furthermore, the language ML comes with primitive values, such as integers, floats, strings, etc. (written with letter c) and functions over these values.

Finally, a program is composed of a sequence of sentences that can optionally be separated by double semi-colon “;;”. A sentence is a single expression or the binding, written let x = a, of an expression a to a variable x.

In normal mode, programs can be written in one or more files, separately compiled, and linked together to form an executable machine code (see Section 4.1.1). However, in the core language, we may assume that all sentences are written in a single file; furthermore, we may replace ;; by in turning the sequence of sentences into a single expression. The language OCaml also offers an interactive loop in which sentences entered by the user are compiled and executed immediately; then, their results are printed on the terminal.

Note

We use the interactive mode to illustrate most of the examples. The input sentences are closed with a double semi-colons “;;”. All programs and the output of the interpreter (when useful) are displayed in a pink background. Input is marked with a vertical bar on the left margin, usually in green, except for incorrect programs that that uses a red mark. Here is an example. Some larger examples, called implementation notes, are delimited by horizontal braces as illustrated right below:

Implementation notes are delimited as this one. They contain explanations in English (not in OCaml comments) and several OCaml phrases.

All phrases of a note belong to the same file (this one belong to README) and are meant to be compiled (rather than interpreted).

The execution of the first phrase prints the string "Hello\n" to the terminal. The system indicates that the result of the evaluation is of type unit. The evaluation of the second phrase binds the intermediate result of the evaluation of the expression 4.0 * atan 1.0, that is the float 3.14..., to the variable pi. This execution does not produce any output; the system only prints the type information and the value that is bound to pi. The last phrase defines a function that takes a parameter x and returns the product of x and itself. Because of the type of the binary primitive operation *., which is float -> float -> float, the system infers that both x and the the result square x must be of type float. A mismatch between types, which often reveals a programmer's error, is detected and reported:

Function definitions may be recursive, provided this is requested explicitly, using the keyword rec:

Functions can be passed to other functions as argument, or received as results, leading to higher-functions also called functionals. For instance, the composition of two functions can be defined exactly as in mathematics:

The best illustration OCaml of the power of functions might be the function “power” itself!

Here, the expression (fun x -> x) is the anonymous identity function. Extending the parallel with mathematics, we may define the derivative of an arbitrary function f. Since we use numerical rather than formal computation, the derivative is parameterized by the increment step dx:

Then, the third derivative sin''' of the sinus function can be obtained by computing the cubic power of the derivative function and applying it to the sinus function. Last, we calculate its value for the real pi.

This capability of functions to manipulate other functions as one would do in mathematics is almost unlimited... modulo the running time and the rounding errors.

1.2 The syntax of Core ML

Before continuing with more features of OCaml, let us see how a very simple subset of the language can be formalized.

In general, when giving a formal presentation of a language, we tend to keep the number of constructs small by factoring similar constructs as much as possible and explaining derived constructs by means of simple translations, such as syntactic sugar.

For instance, in the core language, we can omit phrases. That is, we transform sequences of bindings such as let x₁ = a₁;; let x₂ = a₂;; a into expressions of the form let x₁ = a₁ in let x₂ = a₂ in a. Similarly, numbers, strings, but also lists, pairs, etc. as well as operations on those values can all be treated as constants and applications of constants to values.

Formally, we assume a collection of constants c ∈ C that are partitioned into constructors C ∈ C⁺ and primitives f ∈ C⁻. Constants also come with an arity, that is, we assume a mapping arity from C to IN. For instance, integers and booleans are constructors of arity 0, pair is a constructor of arity 2, arithmetic operations, such as + or × are primitives of arity 2, and not is a primitive of arity 1. Intuitively, constructors are passive: they may take arguments, but should ignore their shape and simply build up larger values with their arguments embedded. On the opposite, primitives are active: they may examine the shape of their arguments, operate on inner embedded values, and transform them. This difference between constants and primitives will appear more clearly below, when we define their semantics. In summary, the syntax of expressions is given below:

Expressions can be represented in OCaml by their abstract-syntax trees, which are elements of the following data-type expr:

Of course, a full implementation should also provide a lexer and a parser, so that the expression e could be entered using the concrete syntax (λ x. x * x) ((λ x. x+1) 2) and be automatically transformed into the abstract syntax tree above.

1.3 The dynamic semantics of Core ML

Giving the syntax of a programming language is a prerequisite to the definition of the language, but does not define the language itself. The syntax of a language describes the set of sentences that are well-formed expressions and programs that are acceptable inputs. However, the syntax of the language does not determine how these expressions are to be computed, nor what they mean. For that purpose, we need to define the semantics of the language.

(As a counter example, if one uses a sample of programs only as a pool of inputs to experiment with some pretty printing tool, it does not make sense to talk about the semantics of these programs.)

There are two main approaches to defining the semantics of programming languages: the simplest, more intuitive way is to give an operational semantics, which amounts to describing the computation process. It relates programs —as syntactic objects— between one another, closely following the evaluation steps. Usually, this models rather fairly the evaluation of programs on real computers. This level of description is both appropriate and convenient to prove properties about the evaluation, such as confluence or type soundness. However, it also contains many low-level details that makes other kinds of properties harder to prove. This approach is somehow too concrete —it is sometimes said to be “too syntactic”. In particular, it does not explain well what programs really are.

The alternative is to give a denotational semantics of programs. This amounts to building a mathematical structure whose objects, called domains, are used to represent the meanings of programs: every program is then mapped to one of these objects. The denotational semantics is much more abstract. In principle, it should not use any reference to the syntax of programs, not even to their evaluation process. However, it is often difficult to build the mathematical domains that are used as the meanings of programs. In return, this semantics may allow to prove difficult properties in an extremely concise way.

The denotational and operational approaches to semantics are actually complementary. Hereafter, we only consider operational semantics, because we will focus on the evaluation process and its correctness.

In general, operational semantics relates programs to answers describing the result of their evaluation. Values are the subset of answers expected from normal evaluations.

A particular case of operational semantics is called a reduction semantics. Here, answers are a subset of programs and the semantic relation is defined as the transitive closure of a small-step internal binary relation (called reduction) between programs.

The latter is often called small-step style of operational semantics, sometimes also called Structural Operational Semantics [61]. The former is big-step style, sometimes also called Natural Semantics [39].

1.3.1 Reduction semantics

The call-by-value reduction semantics for ML is defined as follows: values are either functions, constructed values, or partially applied constants; a constructed value is a constructor applied to as many values as the arity of the constructor; a partially applied constant is either a primitive or a constructor applied to fewer values than the arity of the constant. This is summarized below, writing v for values:

In fact, a partially applied constant cⁿ v₁ … v_k behaves as the function λ x_k+1. …λ x_n. c^k v₁ … v_k x_k+1 … x_n, with k<n. Indeed, it is a value.

Since values are subsets of programs, they can be characterized by a predicate evaluated defined on expressions:

The small-step reduction is defined by a set of redexes and is closed by congruence with respect to evaluations contexts.

Redexes describe the reduction at the place where it occurs; they are the heart of the reduction semantics:

Redexes of the latter form, which describe how to reduce primitives, are also called delta rules. We write δ for the union ∪_{f ∈
C⁻} (δ_f). For instance, the rule (δ₊) is the relation {(p + q, p+q) ∣ p, q ∈ IN} where n is the constant representing the integer n.

Redexes are partial functions from programs to programs. Hence, they can be represented as OCaml functions, raising an exception Reduce when there are applied to values outside of their domain. The δ-rules can be implemented straightforwardly.

To implement (β_v), we first need an auxiliary function that substitutes a variable for a value in a term. Since the expression to be substituted will always be a value, hence closed, we do not have to perform α-conversion to avoid variable capture.

The evaluation contexts E describe the occurrences inside programs where the reduction may actually occur. In general, a (one-hole) context is an expression with a hole —which can be seen as a distinguished constant, written [·]— occurring exactly once. For instance, λ x. x [·] is a context. Evaluation contexts are contexts where the hole can only occur at some admissible positions that often described by a grammar. For ML, the (call-by-value) evaluation contexts are:

We write E[a] the term obtained by filling the expression a in the evaluation context E (or in other words by replacing the constant [·] by the expression a).

Finally, the small-step reduction is the closure of redexes by the congruence rule:

The evaluation relation is then the transitive closure →^* of the small step reduction →. Note that values are irreducible, indeed.

There are several ways to treat evaluation contexts in practice. The most standard solution is not to represent them, ie. to represent them as evaluation contexts of the host language, using its run-time stack. Typically, an evaluator would be defined as follows:

The function eval visits the tree top-down. On the descent it evaluates all subterms that are not values in the order prescribed by the evaluation contexts; before ascent, it replaces subtrees bu their evaluated forms. If this succeeds it recursively evaluates the reduct; otherwise, it simply returns the resulting expression.

This algorithm is efficient, since the input term is scanned only once, from the root to the leaves, and reduced from the leaves to the root. However, this optimized implementation is not a straightforward implementation of the reduction semantics.

If efficiency is not an issue, the step-by-step reduction can be recovered by a slight change to this algorithm, stopping reduction after each step.

Here, contexts are still implicit, and redexes are immediately reduced and put back into their evaluation context. However, the eval_step function can easily be decomposed into three operations: eval_context that returns an evaluation context and a term, the reduction per say, and the reconstruction of the result by filling the result of the reduction back into the evaluation context. The simplest representation of contexts is to view them as functions form terms to terms as follows:

Then, the following function split a term into a pair of an evaluation context and a term.

Finally, it the one-step reduction rewrites the term as a pair E [a] of an evaluation context E and a term t, apply top reduces the term a to a', and returns E [a], exactly as the formal specification.

The reduction function is obtain from the one-step reduction by iterating the process until no more reduction applies.

This implementation of reduction closely follows the formal definition. Of course, it is less efficient the direct implementation. Exercise 1 presents yet another solution that combines small step reduction with an efficient implementation.

Remark 1 The following rule could be taken as an alternative for (Let_v).

let x = v in a → (λ x. a) v

Observe that the right hand side can then be reduced to a [x ← v] by (β_v). We chose the direct form, because in ML, the intermediate form would not necessarily be well-typed.

Example 1 The expression (λ x. (x * x)) ((λ x. (x + 1)) 2) is reduced to the value 9 as follows (we underline the sub-term to be reduced):

	(λ x. (x * x)) ((λ x. (x + 1))) 2)
→	(λ x. (x * x)) (2 + 1)	(β_v)
→	(λ x. (x * x)) 3	(δ₊)
→	(3 * 3)	(β_v)
→	9	(δ_*)

We can check this example by running it through the evaluator:

eval e;;

- : expr = Const {name=Int 9; constr=true; arity=0}

Exercise 1 ((**) Representing evaluation contexts) Evaluation contexts are not explicitly represented above. Instead, they are left implicit from the runtime stack and functions from terms to terms. In this exercise, we represent evaluation contexts explicitly into a dedicated data-structure, which enables to examined them by pattern matching.

In fact, it is more convenient to hold contexts by their hole—where reduction happens. To this aim, we represent them upside-down, following Huet's notion of zippers [32]. Zippers are a systematic and efficient way of representing every step while walking along a tree. Informally, the zipper is closed when at the top of the tree; walking down the tree will open up the top of the zipper, turning the top of the tree into backward-pointers so that the tree can be rebuilt when walking back up, after some of the subtrees might have been changed.

Actually, the zipper definition can be read from the formal BNF definition of evaluations contexts:

E ::= [·] ∣ E a ∣ v E ∣ let x = E in a

The OCaml definition is:

type context = | Top | AppL of context * expr | AppR of value * context | LetL of string * context * exprand value = int * expr

The left argument of constructor AppR is always a value. A value is a expression of a certain form. However, the type system cannot enfore this invariant. For sake of efficiency, values also carry their arity, which is the number of arguments a value must be applied to before any reduction may occur. For instance, a constant of arity k is a value of arity k. A function is a value of arity 1. Hence, a fully applied contructor such as 1 will be given an strictly positive arity, eg. 1.

Note that the type context is linear, in the sense that constructors have at more one context subterm. This leads to two opposite representations of contexts. The naive representation of context let x = [·] a₂ in a3 is LetL (x, AppL (Top, a2)), a3). However, we shall represent them upside-down by the term AppL (LetL (x, Top, a3), a2), following the idea of zippers —this justifies our choice of Top rather than Hole for the empty context. This should read “a context where the hole is below the left branch of an application node whose right branch is a₃ and which is itself (the left branch of) a binding of x whose body is a₂ and which is itself at the top”.

A term a₀ can usually be decomposed as a one hole context E [a] in many ways if we do not impose that a is a reducible. For instance, taking (a₁ a₂) a₃, allows the following decompositions

[·] [let x = a₁ a₂ in a₃]

(let x= [·] in a₃) [a₁ a₂]

(let x = [·] a₂ in a₃) [a₁]

(let x = a₁ [·] in a₃) [a₂]

(The last decompistion is correct only when a₁ is a value.) These decompositions can be described by a pair whose left-hand side is the context and whose right-hand side is the term to be placed in the hole of the context:

Top, Let (x, App (a1, a2), a3)LetL (x, Top, a3), App (a1, a2)AppL (LetL (Top, a2), a3), a1AppR ((k, a1), LetL (Top, a3)) a2

They can also be represented graphically:

As shown in the graph, the different decompositions can be obtained by zipping (push some of the term structure inside the context) or unzipping (popping the structure from the context back to the term). This allows a simple change of focus, and efficient exploration and transformation of the region (both up the context and down the term) at the junction.

Give a program context_fill of type context * expr -> expr that takes a decomposition (E, a) and returns the expression E[a].

Chapter 1 Core ML

1.1 Discovering Core ML

Note

1.2 The syntax of Core ML

1.3 The dynamic semantics of Core ML

1.3.1 Reduction semantics

1.3.2 Properties of the reduction

Classifying evaluations of programs

1.3.3 Big-step operational semantics

1.4 The static semantics of Core ML

1.4.1 Types and programs

1.4.2 Type inference

1.4.3 Unification for simple types

1.4.4 Polymorphism

1.5 Recursion

1.5.1 Fix-point combinator

Mutually recursive definitions

Recursion and polymorphism

1.5.2 Recursive types

1.5.3 Type inference v.s. type checking

Further reading