The new release of OCaml (4.01.0) was just announced! The runup to a major release like this is normally a frantic time to test that your favourite applications don’t break unexpectedly due to some incompatible language feature. This release cycle has a little different though, as we’ve been working hard on using the OPAM package database to build an online regression testing infrastructure to mechanize much of this process.
I wanted to share some of what OCamlot does today, some of the results from about 3 months worth of runs that may help OCaml package maintainers, and finally where we’re going with future developments. This work has been done in collaboration with David Sheets (who built the OCamlot daemon) and the OPAM team ably led by Thomas Gazagnaire at OCamlPro. We’ve also had a lot of help from Jeremie Dimino and Yury Sulsky from Jane Street and Dave Scott and Jon Ludlam from Citrix acting as guinea pigs for their respective regular releases of the Core and Xen/XAPI releases to OPAM.
Towards a truely portable OCaml
The upstream OCaml toolchain is built on very UNIX-like principles, with a number of command-line tools that form a build pipeline. This process usually ends with linking the intermediate object files with a runtime library that provides the garbage collector and other intrinsic OS functions.
While the compiler tools themselves are quite portable and work on almost any UNIX-like system, the build system scaffolding around third-party packages is less portable. Some features such as C bindings often contribute to build breakage on some less-used operating systems such as FreeBSD or OpenBSD, as they usually require probing for header file locations or adding custom
CFLAGS before building.
Every Internet of Things starts with a tangled pile of ARM Dreamplugs.
And in our server room, venerable Sparc and PowerPC G4 OpenBSD boxen still live.
Finding older machines is getting tough, but here's Dave's old iMac G5 running Linux.
Most OCaml developers use x86-based machines and so foreign architectures also get less day-to-day testing (OCaml has superb support for fast native code compilation on ARM, PowerPC, Sparc32, and we’re working on MIPS64 here as part of the CHERI project).
We want to make sure that OCaml and its package ecosystem works just as well in the embedded ecosystem as well as it does on vanilla x86 Linux. This includes running on my Linux iMac G5, my FreeBSD Raspberry Pi, my OpenBSD Pandaboard, or even on a bytecode-only architecture like an OpenBSD/Sparc64.
In the longer term, this paves the way to reliable cross-compiled packages for Windows, Android and iPhone (all of which have OCaml ports, but aren’t heavily tested with the full package database). The only practical way to get started on this is by building an automated test infrastructure for OCaml that explores the feature matrix (which eerily for me, happened in the early days of Xen too, via XenRT to stabilize the hypervisor).
Why not Jenkins or Travis?
When we first started hacking on the OPAM testing infrastructure earlier this year, I maintained a local Jenkins installation that monitored the repository. While Jenkins is a superb tool for many continuous integration tasks, it fell over badly when trying to use it on the non-x86 and non-Linux (or Windows) operating systems. Jenkins requires a full Java runtime stack to be available on each of the client machines, which was taking up more time to get up and running than a simple OCaml-based client and server that could compile to both portable bytecode or fast native code.
The other difficulty with OPAM is selecting which packages actually need to be tested, as it has a constraint-based package solver that supports expressive forwards and backwards version restrictions. While basic tests of the latest packages worked with Jenkins, we needed to increasingly customize it to automate interfacing directly with the OPAM libraries and calculating test schedules based on incoming change requests.
Another factor that ruled out depending on hosted services such as Travis is that they tend to support x86-only architectures, whereas we really want to test the full spectrum of CPU variants supported by OCaml. This doesn’t mean that there’s no place for Travis of course, and in fact Mike Lin has already made this work with OPAM.
For our full testing needs though, OCamlot was born: an OCaml client and server system which coordinates different versions of the compiler, architectures and OPAM versions and records the results for triage and fixing issues.
The latest alpha release of OCamlot is pretty straightforward to run locally, if you are so inclined. First start a server process:
$ git clone git://github.com/ocamllabs/ocamlot $ cd ocamlot $ ./install_deps.sh $ oasis setup # (edit lib/config.ml if you need to change ports) $ make $ ./_build/lib/ocamlot_cmd.native --help $ ./_build/lib/ocamlot_cmd.native serve
The server listens on localhost only by default, and normally an SSL-to-TCP proxy is deployed to listen for external connections (I use stud, which is fast and easy to configure).
The OCamlot clients require a local compilation of OCaml, and they autodetect their local CPU architecture and operating system. I’ve put a gist script together that automates this on most Linux and BSD variants. Just customize the top variables (set
gmake under BSD) and set the hostname to your server process.
The results repository and auto-triage
Once the client is running, the server dispatches tasks from its test matrix, which is calculated from the OPAM package repository. The server maintains a results repository, which is a Git filesystem database that records the build results and logs via an s-expression per task. It also uses Cohttp to serve up the results for a web browser.
It’s very convenient using Git repositories like this since we can use GitHub (or any other Git host) to coordinate and record results, and restart the server without needing any local state. So convenient, in fact, that Thomas and I have been working on a more formal engine for this called Irminsule (more on that in a later post, though).
It’s almost unheard of to have a full OCamlot run go by without some errors, and so David put together the
ocamlot triage command. This takes the state repository and runs a set of regular expressions over it to classify them into common errors. The full file is here, but an excerpt should give you an idea of what we look for:
(* ordered minor to severe *) type analysis = | Solver of solver_error option | Dep of string * analysis | Transient of transient_error | System of system_error | Meta of meta_error | Ext_dep of ext_dep_error | Build of build_error | Multiple of analysis list with sexp
The errors are ordered by severity to aid in color highlighting. They start with OPAM solver failures and dependency failures (e.g. due to trying to build a package that requires a specific OCaml version that isn’t available), and move onto missing package dependencies or system libraris.
Testing the run up to OCaml 4.01
Of course, running all these tests is useless without taking action on the results. I’ve been keeping track of them in issue #1029. The nice thing about GitHub issues is that when this bug is referenced in commits (even in other repositories) a cross-reference shows up on that webpage and lets everything be tracked nicely.
So what were the most common failures in the runup to 4.01, and what should you avoid when writing your own code?
Different standard library module signatures
There have been a few changes to some of the functor signatures in the standard library, such as adding a
find function to Set (mantis #5864). A third-party library that tries to match the functor signature will fail to compile with a type error, such as this one below for zipperposition.0.2:
Error: The implementation src/ptset.ml does not match the interface src/ptset.cmi: ... In module Big: The field `find' is required but not provided
The relevant code in
zipperposition makes it clear what the problem is:
(* Big-endian Patricia trees *) module Big : sig include Set.S with type elt = int val intersect : t -> t -> bool end
This particular bug was reported upstream, and the fix requires implementing the
find function for the Patricia-tree-based
Set. Since the OPAM package will always be broken on OCaml 4.01, it was marked with a compiler version constraint to prevent it being selected for installation under that compiler. When a new version with the fix is uploaded to OPAM, it will always be selected in preference to this broken one.
One other 4.01 change that temporarily broke most of the bigger networking libraries such as Core, Lwt and Batteries was the addition of the close-on-exec flag to the
Unix module. This change only affects upstream packages that redefine the UNIX module for their own purposes (such as adding an asynchronous I/O monad as Lwt does), hence it affects the standard library replacement packages.
The fix here was to locally add patches into the relevant OPAM packages to immediately unbreak things when the fix when into the 4.01 branch of the compiler, and notify upstream maintainers to release new versions of their projects. There’s a subtle problem here: when a patch such as this goes into an unreleased branch of the compiler (such as
4.01.0dev), it’s hard to reliably detect if the user has got the very latest version of the compiler or not. If you do have problems like this in the future, try recompiling via
opam switch reinstall <version> to the latest branch.
It’s very useful to be able to drop in bleeding-edge compiler tools into the OPAM repository using compiler constraints like this. For an example, see Alain Frisch’s ppx_tools, that require the very latest 4.02dev trunk release to compile his new extension-points feature.
Multiple object definitions
OCaml 4.01 also restricts multiple method definitions with the same name in the same object. This leaves only inheritance as the way to override method names, but some packages such as OCamlnet and Mlorg had minor uses of the old mechanism.
You can see this by using
$ opam switch 4.00.1 $ eval `opam config env` $ ocaml # object method x = 1 method x = 2 end;; - : < x : int > = <obj> $ opam switch 4.01.0 $ eval `opam config env` $ ocaml # object method x = 1 method x = 2 end;; Error: The method `x' has multiple definitions in this object
New warnings, and the dreaded warnings-as-errors
After a decade of being deprecated, the
(&) operators finally had a warning turned on by default.
$ ocaml # true or false;; Warning 3: deprecated feature: operator (or); you should use (||) instead - : bool = true
This wouldn’t normally be so bad, except that a surprising number of released packages also turn warnings into fatal errors (by using the
-w @ flags explained in the manual). Warnings-as-errors is extremely useful when developing code but is rather harmful in released code, since a future compiler can choose to emit new warnings that aren’t necessarily fatal bugs.
Packages that failed like this include ocamlgraph, spotlib, quickcheck, OPA, Lablgtk-extras and many more. Please do make an effort to not leave this option turned on in your packages, or else it makes life more difficult for testing your code on bleeding edge versions of the compiler in the future.
It’s worth noting here that OCaml 4.01 has introduced a fair number of new and very useful warnings across a number of areas, mainly to do with detecting unexpected ambiguation or shadowing of values. I’ll cover more on these in a future post about the new 4.01 goodies.
External system dependencies
While there any many packages in OPAM that are pure OCaml, there are also a substantial number that require other system tools to be installed. The Lablgtk GUI library obviously requires the C
gtk library to be installed.
Determining if these libraries are installed on a particular OS is well beyond the scope of OPAM, as there are almost as many package managers as there are operating systems. However, it’s important for automated testing and user-friendly error messages to have some notion of detecting if the environment is ready for the OCaml package or not.
We’re solving this by using a
depexts field in OPAM that consists of a set of tags that identify OS-specific packages that need to be present. A separate script can query these tags from OPAM and do the OS-specific tests or installation.
For example, here’s the
sqlite3-ocaml OPAM description:
opam-version: "1" maintainer: "email@example.com" build: [ ["ocaml" "setup.ml" "-configure"] ["ocaml" "setup.ml" "-build"] ["ocaml" "setup.ml" "-install"] ] remove: [ ["ocamlfind" "remove" "sqlite3"] ] depends: ["ocamlfind"] depexts: [ [ ["debian" ] [ "libsqlite3-dev"] ] [ ["freebsd" ] [ "database/sqlite3"] ] [ ["openbsd" ] [ "database/sqlite3"] ] ]
depexts field here lists APT package for Debian, and the ports tree locations for FreeBSD and OpenBSD. It could also list more specialised tags for particular versions of an OS. You can query this from OPAM as follows:
$ opam install -e debian sqlite3-ocaml libsqlite3-dev $ opam install -e openbsd sqlite3-ocaml database/sqlite3
OCamlot therefore needs to query the
depexts field from the package and run the right
pkg_add commands. I’ll write about this in more detail when it’s fully baked, as we’ve modified the semantics of the tag querying between OPAM 1.0 and OPAM 1.1 to make it easier to use in OCamlot.
Portable shell scripts
Once we’ve gotten past the hurdle of the compiler version causing failures, there is the small matter of testing non-Linux operating systems, as well as non-x86 CPU architectures. The #1029 overview lists many of these failures under the Portability section.
Damien Doligez made some excellent points about how to write portable Makefiles that works across both GNU and BSD makes. This is why the carefully crafted OCaml Makefiles do not require GNU make to be installed when compiling on FreeBSD or OpenBSD (MacOS X gave up the fight a long time ago and installs GNU make as its default
OPAM tries to help out BSD by providing a
make macro in
opam files that is substituted with either
"make" (by default) or
"gmake" (for BSD). While this works for for the toplevel invocation of the Makefile, it fails if the Makefile recursively invokes further targets without using the
$(MAKE) variable instead of directly calling the command. Patching these sorts of things is easy but tedious: see the patchfile for the Facile constraint programming library for an example.
The real problem here, of course, is that package maintainers cannot be reasonably expected to test their code on systems that they don’t normally use–if we demanded perfect portability to be present in the main OPAM repository, we would’t get any submissions!
OCamlot automates this nicely though, by finding lots of portability bugs automatically, and maintainers are by-and-large very responsive when we report the problem upstream.
The emerging distributed workflow
The big drawback to OCamlot in its current form is the amount of triage effort it puts on the OPAM maintainers. The package database has now exceeded 500 packages in just a short few months, and has over 1500 unique versions that all need build testing and more accurate constraints. The wider community has been really keen to participate in helping with triage (just look at all the other people that leapt in on bug #1029), so its our immediate priority to make OCamlot more transparent for people that want to use it to improve their own packages, and in the future also use it to test various hypotheses about all the available open-source OCaml code (see Jacques’ experiment with monomorphic let as an example of something that can benefit from wider automated compilation).
I’ll talk more about how we’re solving this in my upcoming OCaml 2013 Workshop talk about the Platform. I don’t want spoil it too much, but it involves a lovely distributed Git workflow, an enhanced opam2web, and a brand new metadata overlay system for OPAM that lets us enhance the package database with extra information such as statistics, portability and test results, but without polluting the main Git repository with all this extra non-essential data.
If you’re really curious to know right now, then you can see the outline of the new system at Amir’s new ocaml.org wireframes blog post, where Part III contains the continuous integration workflow. A lot of infrastructure work has gone into building all of this over the summer, and now it’s all starting to be deployed in a very satisfying way…