Making it easier for beginners to learn OCaml
- June 21, 2013
- Last updated on 2013/06/24
This blog post is from Arthur Charguéraud, whom we recently had the pleasure to see again in Rocquencourt. It comes from a recent submission to the OCaml 2013 Workshop, and is a discussion of some things that could be improved to make OCaml a better teaching language.
Of course, while everyone agrees in principle that tooling is good
and that making the language simple to learn (and teach) is a noble
cause, the details of the proposal are controversial. We hope that our
dear readers will understand this as a first position statement to
stimulate discussion and contribution, rather than a cause for outrage –
although some here would agree that suggesting !r
as a
lvalue is somewhat heretic.
Making it easier for beginners to learn OCaml
Unless having a private tutor, learning OCaml as a first programming language can be extremely frustrating at first, if not completely discouraging. Based on a long experience of teaching OCaml, based on the analysis of a large database of failing-to-compile OCaml source code, and based on the recent arguments put forward for justifying the use of Python instead of OCaml for teaching programming, we argue that it would be possible, with relatively little effort, to make OCaml significantly more accessible to beginners. More precisely, we describe parsing tools, minor syntax extensions, type-checking tools and libraries that, altogether, should make OCaml much easier to learn.
Problems
We have observed that OCaml is really hard for beginners to learn without help. The main cause is the terribly poor reporting of static and dynamic errors. We describe a few striking examples below.
Syntax errors are often reported near the end of a top-level definition although the error is actually located in the middle of the definition. This situation occurs in particular when replacing a
in
keyword with a semicolon, a very frequent mistake.The instruction contained in the
then
branch of a conditional cannot be followed by a semicolon, contrary to the instruction in theelse
branch (and to instructions in regular sequences). The logic of semicolon being a separator and not a terminator is totally incomprehensible for beginners.Very early in their OCaml programming experience, beginners face type errors of the form “foo has type
int -> int -> int
whereas it is expected to be of typeint
”, as a consequence of typos or missing parentheses. The problem is that they encounter this kind of error messages at a point where they do not know anything about arrow types or even about functions.Exceptions such as “array index out of bounds” are reported without a mention of the index involved, nor of the size of the array, nor of the line at which the incorrect array access has been performed. This lack of details slows down debugging. (Using a full-featured debugger is often not an option at this stage.)
Besides our in-situ teaching experience, we have been teaching OCaml through a (French-speaking) website that features online evaluation of exercises (www.france-ioi.org).
Our analysis of submitted programs confirms that the kind of issues reported above are extremely frequent and very time-consuming for beginners to solve on their own. We observed that a significant number of beginners ended up giving up the OCaml tutorial, especially when they were not in a position to obtain help from a nearby OCaml guru.
In addition to the issues described above, which are well-known to everyone who has taught programming using OCaml, we can also learn about the limitations of OCaml from teachers who have introduced their students to programming using other languages. In particular, we have observed a recent and significant trend to use Python as a first programming language. When told about the Python trend, OCaml experts usually express their disappointment through a sentence of the kind: “What!? Python?! But it does not even have static types! Not to mention that variable scoping is broken!”. Despite the wiseness captured by this comment, a number of qualified teachers and researchers have put forwards a collection of arguments to justify the choice of Python over OCaml.
Python provides a simple way to execute a program (
python source.py
), avoiding the need to first compile the program and then execute it. OCaml supports a somewhat similar feature (ocaml source.ml
), however this feature is not advertised and, even worse, it executes top-level definitions one by one before type-checking the entire source file.Python, unlike OCaml, features a generic printing function, which is very handy for debugging. Similarly, Python features a very-simple-to-use function that one can call to print a backtrace.
Python avoids the awkward floating point operators such as
(+.)
instead of(+)
, which is the conventional mathematical notation and is used by nearly all other programming languages.Python has a traditional treatment of variables that are mutable by default, avoiding beginners to be confused as to when to use an
int
or anint ref
, and to face unclear type error messages when forgetting the(!)
operator.Python features unbounded integers, which saves the need to introduce beginners very early (too early) in the programming course to the notion of fixed-size representation and overflows.
Python offers simple-to-use implementations of maps, whereas in OCaml, obtaining a structure more efficient than lists requires a “black-magic” instantiation of the Map functor.
Python has excellent and numerous bindings to plotting and mathematical libraries. These libraries are particularly useful for teaching “programming for scientists” courses. However, there is no equivalent in OCaml, and it currently involves a lot of hard work to develop bindings for Python libraries.
Proposals
There are a number of things that we can do to make life significantly easier for OCaml beginners — and for OCaml teachers.
Syntax of semicolons
In OCaml, instructions (expressions of type unit
) can
usually be terminated by an optional semicolon. For example, a semicolon
is accepted in front of the end
keyword, in front of the
done
keyword, in front of the vertical bar separator
(|
) in pattern matching… but is rejected in front of the
else
keyword! This lack of consistence in the language is
particularly problematic because it forces one to explain very early in
an OCaml course that the semicolon is not a terminator but a
separator for expressions. Moreover, it makes is hard for
beginners to rearrange the lines of a program. We suggest that the
semicolon should be accepted in front of the else
keyword.
This change should not introduce any backward compatibility issue, since
we are only relaxing the syntax.
Syntax of reference updates
To increment the content of a reference r
, one can write
r.contents <- r.contents + 1
. This formulation makes
perfect sense: r
denotes the name of a box (the reference),
r.contents
denotes the contents of this box, and
r.contents <- v
is the syntax to update the contents.
Because writing .contents
all the time is very cumbersome,
we introduce the OCaml notation !r
as a shorthand for
r.contents
. However, beginners then expect to be able to
write !r <- !r + 1
.
Unfortunately, the operator (!)
is defined as a function
and not as a primitive construction for .contents
. So, we
have to teach the form r := !r + 1
, which appears to be
very confusing for students, who tend to write either
!r := !r + 1
or r := r + 1
, but never the
right form. The reason is that they —rightfully— expect to refer to the
content of the reference on both sides of the affectation operator.
There is a simple way to address this problem: extend the syntax in
order to parse !r <- v
as a primitive construction. This
would be no more ad-hoc than the current parsing of
t.(i) <- v
, and would allow teachers to simply say that
<-
is the affectation operator both for references and
for arrays, and that !r
is the way to refer to the contents
of a box. Beginners would be able to write the symmetric form
!r <- !r + 1
, until they become ready to appreciate the
more elaborated forms r := !r + 1
and
incr r
.
Remark: the ambiguous form !t.(i) <- v
should be
resolved as (!t).(i) <- v
, to be consistent with the
parsing of !t.(i)
.
The pre-parser
The purpose of the pre-parser is to reject any program that is not reasonably indented, thereby anticipating on potentially complicated syntax and type errors that the OCaml compiler would report. Moreover, it would impose good coding style to students, a “feature” valued by many teachers using Python. Note that the pre-parser is not enforcing very strict rules about the indentation (unlike an automatic-indentation tool would do). Instead, it is designed to accept all reasonable OCaml styles.
For example, the pre-parser would impose the body of loops, of function bodies, and of cases of pattern matching to be consistently indented. For let bindings, it would impose
- that a
let
be at the beginning of a line where it appears, - that a
in
be the last word of the line where it appears, - that, if the body of a
let
spans over more than one line, then there is a line return immediately after the=
sign.
Moreover, to detect missing parentheses in arithmetic expressions,
the pre-parser would impose infix arithmetic operators to be surrounded
by the same number of spaces on each side (e.g. x-1
and
x - 1
are accepted, but not x -1
).
The pre-typechecker
The pre-typechecker is able to typecheck code written in the core
fragment of OCaml (the fragment used for teaching to beginners). Its
purpose is to report easier-to-understand error messages. One key
ingredient is the error message associated with function applications.
If a call of the form f x y
fails to typecheck, the tool
would report a message of the form “the application does not type-check
because f
has type 'a -> 'a -> int
and
x
has type int
and
y has type float
”. Note that the pre-typechecker does not
need to be as efficient as the real OCaml type-checker, so we can easily
keep track of additional information (e.g. all unifications pairs) that
may be useful to reporting conflicts.
The Easy library
We propose a module called Easy
to be added to the
standard library in order to provide beginners with several useful
functions.
A function
print
of type'a -> unit
, that recurses in data structures except functions and cyclic structures, limiting the depth in order to avoid overflow of the standard output.A function
backtrace
of typeunit -> unit
, that prints an easy-to-read backtrace.Polymorphic arithmetic operators
(+)
,(-)
,(*)
and(/)
of type'a -> 'a -> 'a
, and unary negation of type'a -> 'a
. For types others thanint
andfloat
, an error could either be produced at runtime (requires representation of types at runtime), or an error could be produced by the (pre-)type-checker if it sees that the polymorphic type was not instantiated with a numeric type (this would preclude polymorphic definitions such aslet sq x = x * x
);Pre-instantiated versions of the
Set
andMap
functors using the built-in comparison operator. For exampleSet.add
would be exported as a function of type'a -> 'a Set.t -> 'a Set.t
.
The -bigint flag
We propose to extend ocamlc
and ocamlopt
with a flag called -bigint
, whose purpose is to replace
fixed-size integers with arbitrary-precision integers in the compilation
chain. This feature would avoid the need for premature discussions about
overflows. (Besides, it would be very useful in the context of producing
mechanically-verified software.) Implementing the -bigint
flag requires only minor modifications to the compiler for 64-bit
architectures. For string and array accesses, we need to check a single
bit to make sure that the index provided indeed is an acceptable index.
For indices of for-loops, we need to generate slightly different code
for the manipulation of the loop index.
The -easy flag
We propose the introduction of an option -easy
to the
tools ocamlc
and ocamlopt
. The purpose of this
flag is to:
- run the pre-parser,
- run the pre-typechecker,
- automatically open the
Easy
library, - activate the
-bigint
compilation flag, - ensure that the compiled program reports locations, index and size parameters on out-of-bounds errors.
The ocamleasy tool
We propose the introduction of a tool called ocamleasy
whose purpose is to
run of
ocamldep
and the automatic inclusion of all the dependencies and libraries that are required by the source file provided,compile the program using either
ocamlc -easy
, orocamlopt -easy
, depending on whether the option-opt
is provided to {},execute the program.
Conclusion
Many of the proposals can be implemented in just a few hours of work. Others require a several days of work, but this investment will certainly be negligible compared with the amount of time saved by people learning OCaml. We encourage people teaching using OCaml to give feedback on the proposals and complete the list with their own suggestions. We encourage the OCaml experts to report any incompatibility they can think of between the “easy” mode proposed and the advanced features of the language. Finally, we encourage the OCaml developers to help us implement and release the proposals, so that they become available to the world before people give up completely on the idea of using OCaml for teaching programming languages.