# The various namespace proposals for OCaml: a summary November 7th, 2012 There has been a lot of different proposals during the recent discussion of 'namespaces' for OCaml. I have listed, in no particular order, one proposal by Martin Jambon, one by Fabrice Le Fessant, one by Alain Frisch, and one by Nicolas Pouillard. All proposals being rather complete and precise, but using slightly different concepts and vocabulary, it is difficult to compare them and know which part are essential and which are less-important to evaluate. In the present document, I try to present those four proposals in a unified manner, to make comparison between them easier. This is not meant to be a piece of opinion, there is no judgment of whether the differences exposed are good or bad. I may still have misunderstood some of the proposal aspects, so feel free to correct me if necessary. The discussed proposals are included, for reference, at the end of the document. If only to convince you that my "synthesis" is *not* longer than the four proposals concatenated! The document is split in five parts: 1. A lexicon, and a description of how the current OCaml implementation resolves module references to compilation units 2. A precise discussion of the linking problem; the linker depends on choice made earlier during compilation (how dependencies are recorded in each compiled file) and put constraints on our ability to disambiguate. 3. A precise (formal?) framework to think about how the current resolution semantics (and eventually the various proposals) relate module references in the OCaml source code to compilation units in the filesystem. I came to this frame by comparing and trying to synthesize the various proposals, and this is the novel aspect the document. 4. A description of how, to my understanding, the current propositions fit in this framework. I'm not trying to completely sum up or rewrite the proposals, but to describe in an unified manner a specific aspect of them (the module reference -> compilation unit relation) that I think is crucial. Of course, I may have misunderstood some aspects of the proposal (feel free to correct me and make suggestions), and the framework does not encompass all aspects of all the proposals. 5. Some discussion of how we should evaluate the proposals and which aspects have not been considered in the unified presentation. While parts (1) and (2) aim at absolute objectivity, this one is more subjective and is warmly open to further discussion. PS: I'm considering putting the synthesis on a wiki somewhere, to be able to evolve the document according to your remarks, and even allow you to contribute directly if you wish. I'm not sure where to host it; in particular, it would probably be best if it was non-discoverable, not to disturb the current semi-private nature of the discussion. Any suggestions appreciated. # Part 1: A precise description of the way module names in OCaml source relate to external compilation units ## Lexicon: 'module', 'module name', 'compilation unit', 'module reference' and 'namespace' A more precise definition of the names that are used in the document. If you are familiar with the usual OCaml definitions, you can skip this section. The important point is the distinction between 'module' (a semantic object of the OCaml language, designed in a program by module names, paths and references) and a 'compilation unit' (some data on the filesystem, representing a part of an OCaml program, either in source or compiled form). - A 'module' is a semantic concept of the OCaml language. Roughly, it's a record whose components are addressed by identifiers rather than labels. It has has a type/signature, and can carry types, exceptions and submodules as well as values. I won't consider functors in this summary -- though they may be important/interesting to consider in a complete proposal, as Nicolas suggested. - A 'module name' is an OCaml syntactic object referring to a module. Lexically, it is a capitalized OCaml identifier. Module names are usually considered as parts of 'module paths', that is sequences of '.'-separated module names, describing a module as a submodule in a hierarchy of nested modules. The current synthesis won't have an use of the "module path" concept (except when defining the semantics of the existing `open` construct). We are interested in the relation between module names and compilation units, and only the head module name of a path may refer to a compilation unit, are those cannot be nested. I will use 'M' as a metavariable to denote module names. - A 'compilation unit' is a file containing OCaml source code, meant to be passed to the compiler. Note that it is really "a chunk of OCaml code", we could imagine referring to compilation units using URLs, or some units being stored inside a database or internal IDE representation. I will use 'U' as a metavariable to denote the path of a compilation unit. In the current semantics of the OCaml language, and in all reviewed proposals, each compilation unit also implicitly gives birth to a 'module', whose members are all toplevel declaration phrases of the compilation unit. In the current semantics, the 'module name' of this module is implicitly determined by the source file's path in the filesystem (the capitalized basename); also called 'unit name' in the OCaml manual). I will denote by 'mod(U)' the module name derived from the compilation unit path U. The OCaml manual (http://caml.inria.fr/pub/docs/manual-ocaml/manual020.html) considers a compilation unit to be defined by both the .ml file and its extra-linguistically related .mli file. In practice, both could be freely moved around (a .ml alone determines an implicit signature, and .mli alone can be compiled and used as interface) and this was considered in the present discussion, so I refined here the notion of compilation unit to a single file. We could speak of 'interface unit' and 'implementation unit' to distinguish them. We could similarly distinguish .ml and related .cmo ('source unit' and 'compiled unit'?) but there is no risk of confusion in practice. - A 'namespace' is a new concept of the OCaml language, that is being defined by the various proposals. Roughly, a namespace is an identifier (whose syntax is proposal-dependent) that refers to a place where compilation units live; the name and the "place" are considered one same thing. There are various intuitions around the 'namespace' concept. For some, namespaces should be considered 'open': there would be no intra-linguistic way to list *all* the units present in a given namespace, it is only possible to say that some specific modules (between others) live in a given namespace. For others, namespaces are just specific modes of use of modules: they are defined once and for all and cannot be extended. I will use 'N' as a metavariable to denote namespaces. In my abstract syntax, I will describe namespaces by ':'-separated lists of lowercase words (for example, std:data). There is also an empty namespace ε, and we equate N:ε = ε:N = N. All proposals add a new way to refer to a module in OCaml code : it is possible to prefix a module name by a namespace : N|M. Of all the considered proposals, only Alain's doesn't explicitly use a concept of 'namespace'. I will still use the 'namespace' name to designate a concept of Alain's proposal (in-code references to 'namespace maps'), as it is consistent with the other proposals, in particular Nicolas's which is very close to Alain's. In particular, 'namespace maps' in Alain's proposal do not have a hierarchical structure; this is a special case where namespaces have at most one component. Note that this is just abstract syntax. Concrete syntax choices differ between proposals (most of them use '.' instead of ':' and '|', Alain uses '..' for '|', etc.). ## Motivation for the various proposals : The motivation for the 'namespace' proposals is to solve issues with the current way the OCaml toolchains resolves module references (module paths in the semantic world of OCaml program) by search of compilation units (in the pragmatic world of the filesystem). This is an 'extra-linguistic' aspect of the language. The current toolchain searches for compilation units whose unit names correspond to the toplevel module names (the head module names of module paths) used in the OCaml program being processed. http://caml.inria.fr/pub/docs/manual-ocaml/manual022.html#toc87 There is an extra-linguistic concept of "load path", which is a *set* of filesystem directory paths defined at each tool invocation by configuration, shell environment, and the passing of "-I dir_path/" options to the tool. For each free toplevel module name, a corresponding interface compilation unit is searched in the load path. The main defect of the current scheme is that there may be unit name conflicts in the search directories : if two libraries whose location is included in the search path have a compilation unit with name Foo, the reference Foo is ambiguous and the toolchain will choose either of them arbitrarily without letting any choice to the programmer -- there is no way she could to any of those two unambiguously. OCaml code providers do not have the discipline to uniquify unit names; that could be inconvenient, especially when using them in OCaml code. There is no idiomatic way to make this less heavy, and existing codebase have not taken unit name contention into account. All proposals discuss the following question : how to associate OCaml module references to compilation units in a way that is more easy to control and more resilient to unit name conflicts? The module references prefixed by 'namespaces' should behave better. Some of the proposals also address related issues such as: - how to preserve that disambiguation information in later stages of the compilation process, in particular linking which obeys different rules. That is a very important aspect -- that I had shamefully neglected in previous versions of this document -- as it's no good to disambiguate the typechecker if the linker is still confused. - providing dependency information to the toolchain, in particular so that `ocamldep` continues to work correctly - `open`ing name hierarchies and importing opening/renaming/shadowing decisions (at the semantic level of OCaml programs) # Part 2: A description of the current linker behavior Summary: this section discuss the highly relevant issue of linking, and is necessary to understand some aspects of the proposals (and the flaws of the current state of some others, including previous versions of this document). However, my conclusion is that we can solve the problem internally without asking the user intervention and independently of namespace choices; if my solution works -- which is not so certain -- the linking problem becomes irrelevant to the namespace discussion, and you can jump to the next section directly. As described above, when type-checking a module, references to external compilation units are resolved. Those external compilation units are called "imports" of the current compilation unit. When a .cmi or .cmo is saved for the current unit, the name of each import, along with a checksum of the import interface (.cmi), is saved in the compiled object (.cmi or .cmo). The name of the current module and its interface checksum (found by looking for the corresponding .cmi) is also saved; that is, "imports" actually contain name and interface checksums of all dependencies *and* the current module. Those names and checksums are used later by the linker: each compilation unit is compiled independently into a .cmo, and a set of .cmo is later brought together to form a complete executable. The set of .cmo to link must be in topological order (each .cmo must come after all its dependencies), and this is checked with the list of imported modules contained in each .cmo. So currently, the check that "this dependency is satisfied" relies on module names. Besides, the linker also check that checksums are correct (to avoid linking together modules compiled against different interfaces of the same name): each dependency must correspond to a .cmo that was given earlier, so we have it's "official" checksum and can compare the checksum stored in the current .cmo for this dependency against it. That reliance on module names to associate modules to their dependencies means that the linker, as well as the typer, needs help for disambiguation. It's useless to help the typer to distinguish two different .cmi of the same name, if later the linker is unable to combine two .cmo of the same name, or even to say which of those a given module implementation depends on. ## Details: .cm(x)a, .cmx, -pack As far as I understand, .cma files (respectively .cmxa), for linking purposes, are just sets of .cmo files (resp. .cmx) that are not all linked, only those that are actually used by the module. In other words, passing a .cma is like passing all the .cmo it contains whose name is present as a dependency of one of the other needed modules. Native compilation behaves a bit differently (from bytecode compilation) as it relies on information from external modules *implementation* for code generation (simple cross-module optimizations, etc.). A .cmx contains not only the name and interface (.cmi) checksum of all the modules it depends on, but also the checksums of the compiled implementation (.cmx) of those dependencies. During linking, those are also checked for consistency. There is currently no way to change the module name stored in a compilation unit. If I understand correctly, the `-pack` option was initially designed to do this (rename the packed .cmo/.cmx as part of the hierarchy instead of isolated modules), but it turned out that changing the internal name of a .cmx was an inherently gore and non-portable operation that couldn't be made to work easily on all architectures/systems. The `-for-pack` option allows to choose an internal name that is not exactly derived from the file name, but it's still set in stone at compilation time. ## How current proposals handle the linking problem The linker puts hard constraints on the disambiguation capacity of the whole toolchain. Only Fabrice and Alain have really discussed this aspect. I will describe how their proposal addresses the issue, and make a new proposal that is quite independent from the rest of the namespace handling question -- my point is that it can be made internal and never seen by the user. The "linker issue" needs synchronization between the compiler, which decides what "linktime information" to store in compiled .cm{i,o,x} files, and the linker, which uses that information to produce the executable. This linktime information can be relatively independent from what the user sees or how she uses the code and compiled files, and he doesn't access it -- except in debugging tools that dissect the content of a compiled file. We only need to make sure that: - the stored information is consistent with the users expectation: when he compiles a compilation unit against a given library, the dependency stored really corresponds to this library, so that the linker links it with the library's implementation and not something else - there are as few conflicts as possible, eg. two different external dependencies do not get assigned the same linktime information The two proposals that consider the linking problem suggest very different (symmetric?) solutions: - Fabrice's idea is to use the user-visible namespace, in addition to the file-derived module name, as part of the linktime information. Conflicts are avoided by having a discipline of unique namespaces (eg. containing personal information on the code distributor in the spirit of Java namespaces: inria.stdlib.unix, ocamlpro.tryocaml, rwmjones.libvirt). Operations are provided to "open" or rename namespaces to avoid using those long unique names all the time. - Alain's idea is to put the uniqueness information in the module name, and to use namespaces to provide palatable short names to designate those painfully long modules. This has the advantage of requiring no change to the way the linktime information is computed and stored, which simplifies implementation, tools handling etc. In both cases, the burden of avoiding conflicts is on the users. This is probably a reasonable burden assuming enough discipline and a bit of centralization (GODI, Oasis-DB, etc.) to detect and fix conflicts as early as possible. ## Independence from user-visible naming decisions Both proposals chose to use user-visible information as part of linktime information to help uniqueness. It is actually not necessary: we can discuss user-facing naming/addressing choices and compiler-facing linktime information completely separately. The point of namespaces (see later sections) is to give OCaml-side names to compilation units. Any solution (including the current implementation) can be understood as a mapping from "module references" to compilation units in the filesystem. We could consider linktime information as a completely independent information. At compilation time, developers (or maybe some automated process, see next subsection) decide what linktime information will be associated to the compilation unit being produced. They must to their best to ensure that this linktime information is unique. Splitting those two aspects allows to imagine, for example, two different addressing hierarchies: one designed to face the user and provide a logical organization of modules (in the spirit of Haskell's `Data.Array.Persistent` module names), and the other to help linktime disambiguation and using hopefully-unique organization names (`janestreet.core.persistent_array`). ## An attempt at silently fixing the problem I believe it is actually possible to choose non-conflicting linktime information automatically, without involving user intervention. The idea is to internally generate a seed that is added to the module name (just as Fabrice's namespaces are): - when creating a .cmi, generate a random seed and add it to the stored compilation unit name - when compiling a .cmo or .cmx, extract the seed from the relevant .cmi and insert it in the current file - when resolving an external reference to a compilation unit, add the seed (as well as the name) to the dependency information of the current module; this is naturally done by simply copying the whole "compilation unit name" information - when linking compilation units, dependencies are matched on both the module name and the seed (again this requires no change, the whole "name" component is manipulated). This means that two independent .ml compiled against independent .mli will always be correctly differentiated, even if they have the same name, as the seed differ (... with high probably; if it doesn't work, clean and retry!), even if the .mli are exactly the same content-wise. There are a bit more recompilations than previously: the .cmo, and depending modules, needs to be recompiled as soon as the .cmi is *recompiled*, rather than when the .cmi content *changes*, but this is arguably a small difference. If you didn't change the .mli, just don't recompile it! We could add an heuristic to not change the seed if there already exist a .cmi with the same checksum. The important point (and the reason to generate the seed on .cmi creation rather than .cmo's) is that for bytecode compilation, implementation changes do not force recompilation of dependencies. A corner case where this doesn't provide perfect disambiguation is the following: I have only one .mli, but two .ml that I may want to compile against (say `foo.mli` but `foo_windows.ml`, `foo_unix.ml`; I copy `foo_windows.ml` to `foo.ml`, compile, move `foo.cmo` to `foo_windows.cmo`, and again with `unix`). I believe this is possible with the current implementation -- though certainly not advised by the manual/specification. It is not currently possible to link both foo_windows.cmo and foo_unix.cmo together in an application, but the proposed "cmi seed" method also wouldn't do it directly: you would have to explicitly recompile the .cmi between each compilation, to get different seeds. # Part 3: An abstract, unified presentation of compilation unit resolution In this part, I'm not considering linking matters anymore: only the way the compiler resolves external module reference to compilation units located at some place in the filesystem. The current OCaml compiler interleaves two different aspects: - type-checking of the source file currently being processed - filesystem lookups to decide which compilation units could correspond to a free module reference. However, it is also possible to define the semantics in two steps, a first search pass to collect/define an initial "compilation unit environment" mapping module references to compilation units, and a second search-free type-checking pass. This decomposition makes comparing different proposals easier. A compilation unit environment is built by the following procedure: - iterate on all directories present in the load path - for each directory, iterate on all the compiled interface compilation units (.cmi) in the directory - for each such unit U, add the association (ε|mod(U) -> U) to the compilation unit environment (ε is the empty namespace). This first pass builds a compilation environment. It does not depend on the program being typechecked, but environment variables and `-I` options passed to the compiler may influence the load path. After this first pass, the current source can then be typechecked: when encountering a free module name, the typechecker access the compilation unit bound to it in the compilation unit environment. In an implementation-specific way, it extracts the signature from the compilation unit, and add this interface in the 'typing environment' of the current program (not that typing environment and compilation unit environment are two distinct notions: the typing environment relates OCaml program identifiers to types and signatures), and typechecking then proceeds as usual. This is only a description of the semantics, not of an implementation. You could imagine a naive implementation with a first pass strictly building an environment, or the current search-on-occurrence implementation which can be seen as an optimization of first, making environment building lazy. When trying to compare and relate the different proposals, I discovered that they could be articulated around this compilation unit environment process. More specifically, they suggest new ways to build the initial environment: - when adding the binding (R -> U), do not automatically derive the module reference R from the unit name, but use a module reference R determined by other means (a configuration file in the directory, etc.) - when traversing a directory, do not handle all compilation unit but only some of them as defined in a configuration file, etc. - instead of searching a load path, using arbitrary mapping as defined in a mapping file passed to the compiler... ## Formal structure of hierarchical compilation unit environments In the most general case, compilation units environments are trees recursively defined as having two components: - units: a (possibly empty) mapping from module names to compilation units - subenv.: a (possibly empty) mapping (C ↦ T) from namespace components to compilation unit (sub)environments The empty environment has has two empty maps. For example, the following compilation unit environment maps foo:bar|Baz to "baz.cmo": { Baz -> "/tmp/foo.cmo" foo -> { bar -> { Baz -> "baz.cmo" } } foobar -> { } } This is slightly more refined than a simple mapping from module references to compilation units, as we can make a difference between namespaces that are defined in the environment, but empty (here 'foobar'), and namespaces that do not exist in the environment. A previous version of the document considered all possible namespaces to exist and be empty by default, which made certain errors impossible (`open-namespace N` would always succeed; in some cases we want an error because the namespace N does not exist). It is still possible to represent the previous behavior, by considering as "default environment" an environment where all possibles namespaces are defined as empty. We can define a `merge` operation that takes the union of two environment E₁ and E₂. It returns: - the union of the units mappings - a subenv. mapping defined by, for each component C of E₁ or E₂, the merging of both subenvironments E₁(C) (or the empty environment if undefined) and E₂(C) (or the empty environment if undefined) ## Two semantics for the `open` construct This two-pass semantics allows simple explanations of `open` semantics for namespaces. The current semantics of `open ` cannot be explained as an action on the `module environment`, but on the complete typing environment of an OCaml program: it looks for the module (OCaml semantic object; not necessarily associated to a compilation unit, eg. possibly a submodule of a compilation unit, or result from a functor application, etc.) denoted by the expression , and adds all its declarations to the environment. We can define a new `open-namespace N` construct, reflecting constructions used in the proposals (again, this is not a choice of concrete syntax; all propositions but Alain's only use `open`). `open-namespace N₁` is defined by its action on the compilation unit environment. Informally, for each binding of the form (N₁:N₂|M -> U) in the environment, (N₂|M -> U) is also added to the environment. More precisely, there are two possible choices of semantics - a "merging" open, that follows the intuition that namespaces are "open" (sic): `open-namespace N₁` finds the subenvironment E₁ at N₁ in the current environment E (or fails if it doesn't exist), adds its units to the units of E, and merge each subenvironment (C:E₂) of E₁ to the current subenvironment E.C. - a "shadowing" open, that respect the semantics of "closed" namespaces that cannot be extended after definition: `open-namespace N₁` finds the subenvironment E₁ at N₁ in the current environment E (or fails if it doesn't exist), adds its units to the units of E, and use each subenvironment (C:E₂) of E₁ to *replace* the current subenvironment E.C. (On units, different behavior are possible in case of conflict: error if the mappings differ, or silent shadowing.) My previous synthesis only described the "merging open". Fabrice noticed that this failed to account for his proposal, that uses a "shadowing open", which is consistent with the behavior of OCaml modules. ## Other operations on environment The tree-mapping structure of compilation unit environments is quite flexible, and a lot of operations can be defined in this framework. It is easy for example to adapt the `open-namespace N₁` definition into a more general `alias-namespace N₁ to N₂` (which adds the subenvironment located at N₁, but prefixed with N₂), in two "merging" and "shadowing" variants. This "prefixing" operation also describes the "include" operation of Nicolas's proposal. There is a large design space of reasonable operations on compilation unit environments (projection, prefixing, deep merging, intersection, removal...). We have found that they are interestingly close (in particular the deep merging operation) to "mixin operations" as defined in the article "Mixin' Up the ML Module System", by Derek Dreyer and Andreas Rossberg, 2008/2011; this is not so surprising, as mixins are supposed to be "open" modules, and are therefore a good fit for (unrestricted) namespaces. # Part 4: How does the current proposals relate module references to compilation units? In Alain's and Nicolas's proposal, new mappings are added by files mapping OCaml identifiers or path to compilation units (given by a filesystem path). Those mapping files: - may be found in the search path (Alain's proposal) - may be passed explicitly to the tool (Nicolas's proposal) In Alain's proposal, the 'namespace' N associated to all those mapping files is derived from the filesystem path of the mapping file. Each line of the mapping files associates a module name M to a compilation unit U, but the compilation unit environment is really enriched with the binding (N|M -> U) : the mapping file defines the subenvironment N. For example, neglecting concrete syntax difference, the file `ex.ns` containing `Foo: /tmp/a.cmi; Bar: /tmp/b.cmi` would enrich the current environment with the mapping ex -> { Foo -> "/tmp/a.cmi" Bar -> "/tmp/b.cmi" } . In Nicolas's proposal, the mapping files already have the structure of a compilation unit environment (including nested subenvironments), which is merged as is, without any influence from the mapping file's filename. In Fabrice's and Martin's proposals, the environment-building pass does not act the same on all directories of the search path, depending on whether the directory contains a distinguished configuration file (Package, ocaml.ns). If not, the current semantics is used (all .cmi are added to the environment). Otherwise: - in Martin's proposal, the Package file explicitly lists the compilation unit U¹, U²... Uⁿ exported (I suppose the directory isn't searched for other compilation units), with an optional namespace N (default value ε). The mappings { mod(U¹) -> U¹, mod(U²) -> U² ... mod(Uⁿ) -> Uⁿ } are added to the subenvironment N. - in Fabrice's proposal, the ocaml.ns file only specifies the directory-common namespace N. The directory is searched for compilation units: the mappings (mod(U) -> U), for each compiled interface compilation unit U in the directory, define a new subenvironment N. In another variant of Fabrice's proposal, there is no configuration file, but each compilation unit U may optionally specify a namespace N. In this case the repository is scanned as usual, and (N|mod(U) -> U) is added to the environment.. Should the usual (ε|mod(U) -> U) binding also be added? This is a flexibility point of the proposal; Fabrice suggests to also bind it only in the case where the currently processed unit does not mention namespaces ("compatibility mode"). Once the compilation unit environment is built, typechecking proceeds as usual in all proposals. When a proposal defined an `open` construct, I believe it always coincided with the behavior of the `open-namespace` construct defined in part 1, in either the "merging" or the "shadowing" variant. Note that some proposals use the same syntax for modules and namespaces: when encountering the concrete code `open Foo`, it is unclear if Foo is a module or a namespace, and the semantics differ; ambiguity resolution rules are needed. In Alain's proposal, there is a distinct `open namespace Foo` syntax which resolves the ambiguity. Remark: it is tempting to try to define the semantics of module renamings as action on the compilation unit environment (from ε|M₁ to ε|M₂), but I think they rather belong, as `open` on modules, to the typing environment. In particular, you may want to alias a module *path* to a module name, to address submodules directly. This would require maintaining separate 'path substitutions', as done in a recent article by Hyeonseung Im, Keiko Nakata, Jacques Garrigue, and Sungwoo Park, for entirely different purposes. # Part 5: How should we evaluate and compare the different proposals? ## What they compare against: uniquifying file names Alain made an extremely simple proposition that doesn't request much change from the current behavior. He only suggested that library writers adopt the convention to use only 'unique' file names, to avoid any conflict issue -- in the type_checker, as well as in the linker stage. To avoid having to use painfully long module names in an OCaml program, Alain suggested to introduce a module name aliasing syntax; which would silently give a module name alias to an existing module path. Such renaming could also be included and shared in separate files, as requested by Yaron Minsky and proposed by Jacques Garrigue. This proposal is not directly discussed further in my synthesis. As applying it would require changes to existing codebases, the consensus seems to be that it is too impractical. However, it is good to keep it in mind when evaluating other proposals, as a 'placebo proposal' to compare against. ## Subjective comparison: User-side resolution of conflicts Disclaimer: while the previous sections were purely an objective presentation of the different proposals in a way that hopefully made them easier to compare, the present section tries to evaluate each proposal on an important corner case, a module reference conflict. I partly discuss 'naturalness' or 'convenience' of different solutions, and it is therefore subjective and possibly misjudged. One important criterion for the namespace proposals, as highlighted by Jacques Garrigue, is whether the *users*, rather than the producers, of two libraries are able to resolve a naming conflict. Note that this is not a *necessary* feature for a namespace proposal. In particular, the 'placebo' proposal (just use long, hopefully-unique names) doesn't have any way to deal with this: if your names were not long and unique enough, well, you lose; you have to ask the code producers to rename them (or recompile them locally under a different filename, if you have the source code, which makes you a code producer). It's less of a problem if names are, by convention, long and hopefully unique than it is currently, with short common names. Fabrice's proposal does not handle conflicts: once your compiled unit has a namespace recorded, you cannot change it from the code user side. If two compiled units are in the search path and have recorded the same namespace and module name, there is a conflict that, if I'm not mistaken, the user cannot resolve alone. Alain's and Nicolas's proposals make it easy for the user to change name->unit associations in case of conflict, because it is their basic level of granularity. If the current namespace mapping attempts to bind foo/mod1.cmi and bar/mod2.cmi to the same module reference, just add two new distinct references to be able to refer to them unambiguously. Nicolas's proposal is slightly more flexible as it allows easy renaming of whole mappings: if code providers A and B provide two respective namespace mapping files "foo/list.mln" and "bar/list.mln" that both bind compilation units to the data:list namespace, and you want a stronger distinction, whether some of those units are identical (conflict) or not, you can write your own "my/list.mln" mapping file, following Nicolas's proposal horrible (:-) surface syntax: module a = struct include "foo/list.mln" end module b = struct include "bar/list.mln" end And thereafter use the desambigued namespaces a:data:list b:data:list. Alain's proposal doesn't have this level of abstraction (namespace maps are flat, first-order mappings), but he is very explicit that namespaces are to be processed by automated tools, that could achieve similar multi-remappings. Martin's proposal is a bit half-way on this point: if the directories foo/ and bar/ both export mod.cmi *and* use the same namespace, you could copy them to foo_copy/mod.cmi and bar_copy.cmi and use a two user-defined Package files to remap those interfaces in two distinct, unambiguous namespaces. While this is fundamentally the same operation as in Alain's or Nicolas's proposal situation, this feels a bit awkward as the proposal was clearly not optimized for this use case (compilation unit are not expected to change packages, hence the implicit directory/unit relation). Still, it can be done purely from the user side. Remark: it is useless to try to handle such conflicts if they are also present at the linker stage -- remember, we can't change the internal name of a .cmx. In Fabrice's proposal for example, even if we found a way to distinguish two modules of the same name in the same namespace, the internal name would be the same and it would be impossible to link them both at the same time. I believe that the internal seeding solution allows to dissociate linking from module discovery, and can be applied in addition to each proposal. ## Aspects not discussed I deliberately left out some of the proposals aspects and surrounding discussions from the current synthesis. - Effect on tools (ocamldoc, ocamldep, ocamlbuild, make, omake, ocp-build...). I'm not familiar with the tooling aspect of those proposals. Alain suggest that tools needing to designate a compilation unit U systematically use the "current semantics" designation (mod(U)), and let namespace-aware post-processing tools do eventual renamings to module references as appropriate. Unless tools actually keep the full reference R and only display mod(U), this would make those tools output less conflict-resilient: possible ambiguity strikes back. - Dependency analysis (really a sub-point of the above). Some proposals provide tools for dependency analysis. I concentrated here on the "risk of module reference conflict" + "resolution of unlikely conflicts" aspects. I think dependency analysis should be discussed independently. - Intra-program mechanisms to act on the module environment. Surprisingly, most proposals where concentrated on the extra-linguistic question of how to relate compilation units to module names, not to operations on module names themselves. Only Alain's 'placebo' proposal discusses this aspect, with Jacques Garrigue's intervention. Nicolas feels it should be discussed independently, and Fabrice proposes a namespace-aliasing construct. As I discussed earlier, module path substitutions are outside the scope of the compilation unit environment. - Subtleties of when to use the 'new' semantics, assuming namespaces, or when to fall back to the current semantics. The details are a bit hairy and differ a lot between the different proposals. I think this is rather independent from the interest of each. Note that Alain's choice to have a distinct namespace / module-path separator makes this a non-issue, the problem is when the same extended identifier may designate both a module path or a namespaced module reference. - Eventual relations with `pack` and functors. Nicolas expressed the idea that his proposal might evolve to express the "functor packing" construct independently suggested by Fabrice. I have not discussed this aspect at all, and don't know whether/how it relates to the compilation unit environment presentation. ## What next? My goal in publishing this synthesis is to advance the discussion in a good direction. I'm interested by your feedback on this synthesis: - Does it faithfully represent your proposal? Is the "formal" part correct? Are some other aspects of missing? Could we describe them in this setting? - Does the current presentation capture the idea of 'namespaces'? Are there some concrete problems that are not expressible in this framework? Besides, I think this could be a good tool to evaluate and evolve the existing proposal. A natural idea is to try to make the union of all the "environment building operations" used in the different proposal, and see whether it is still satisfying, what is missing or should be left out. I will eventually try to describe such an approach (help and suggestions appreciated), but I wished to publish and discuss this synthesis first. Finally, I have two remarks about aspects of the synthesis that could warrant further elaboration. The first is that this two-pass description actually results in a *closed environment* of namespaces at type-checking type. While in principle namespaces are considered open (all operations of the current semantics and the proposals only add new mappings to the environment), after we have an environment we can precisely enumerate all available compilation units and their namespaces. In particular we could derive a module from a given namespace N : it would contain, for all (N|M -> U) in the environment, the module derived from U, under identifier M. This could be used to give a meaning to, say, giving a namespace to a functor. I have discussed this with Nicolas, but am unsure which conclusions to draw: my gut feeling is that it is not a good idea. The second remark is that the unified presentation could be considered as defining a "language" for namespace files. In Nicolas's proposal it is already visible that namespace mapping files are some form of "source code", even if not part of usual OCaml compilation units. Alain, on the contrary, took great care to present its mapping files as *data* rather than programs, possibly the output of preprocessing tools. In Fabrice's proposal, the "language" to talk about namespaces is quite restricted and used (in some variants of his proposal) at the beginning of compilation units; filesystem and compilation option hints are also used. In Martin's proposal, as in the current semantics, the construction of the environment is completely implicit and by-convention, directed only by passing -I options to the compiler. How explicit and expressive do we want to be?