(last updated on Dec 2013)
This time last year in 2012, I had just announced the formation of a new group called OCaml Labs in the Cambridge Computer Lab that would combine research and community work towards the practical application of functional programming. An incredible year has absolutely flown by, and I’ve put together this post to summarise what’s gone on, and point to our future directions for 2014.
The theme of our group was not to be pure research, but rather a hybrid group that would take on some of the load of day-to-day OCaml maintenance from INRIA, as well as help grow the wider OCaml community. To this end, all of our projects have been highly collaborative, often involving colleagues from OCamlPro, INRIA, Jane Street, Lexifi and Citrix.
At the start of 2013, OCaml was in the interesting position of being a mature decades-old language with a small, loyal community of industrial users who built mission critical applications using it. We had the opportunity to sit down with many of them at the OCaml Consortium meeting and prioritise where we started work. The answer came back clearly: while the compiler itself is legendary for its stability, the tooling around it (such as package management) was a pressing problem.
Our solution to this tooling was centered around the OPAM package manager that OCamlPro released into beta just at the end of 2012, and had its first stable release in March 2013. OPAM differs from most system package managers by emphasising a flexible distributed workflow that uses version constraints to ensure incompatible libraries aren’t mixed up (important for the statically-typed OCaml that is very careful about dependencies). Working closely with OCamlPro we developed a git-based workflow to make it possible for users (both individual or industrial) to easily build up their own package repositories and redistribute OCaml code, and started curating the package repository.
The results have been satisfying: we started with an initial set of around 100 packages in OPAM (mostly imported by the 4 developers), and ended 2013 with 587 unique packages and 2000 individual versions, with contributions from 160 individuals. We now have a curated central package repository for anyone to submit their OCaml code, several third-party remotes are maintained (e.g. the Xen Project and Ocsigen). We also regularly receive releases of the Core libraries from Jane Street, and updates from sources as varied as Facebook, Coherent PDF, to the Frenetic SDN research.
A notable contribution from OCamlPro during this time was to clarify the licensing on the package repository to be the liberal CC0, and also to pass ownership to the OCaml organization on GitHub, where it’s now jointly maintained by OCaml Labs, OCamlPro and anyone else that wishes to contribute.
It’s been quite interesting just watching all the varied code fly into the repository, but stability quickly became a concern as the new packages piled up. OCaml compiles to native code on not just x86, but also PowerPC, Sparc and ARM CPUs. We kicked off various efforts into automated testing: firstly David Sheets built the OCamlot daemon that would schedule builds across all the exotic hardware. Later in the year, the Travis service launched support for testing from GitHub pull requests, and this became the front line of automated checking for all incoming new packages to OPAM.
A major headache with automated testing is usually setting up the right build environment with external library dependencies, and so we added Docker support to make it easier to bulk-build packages for local developer use, with the results of builds available publically for anyone to help triage. Unfortunately fixing the bugs themselves is still a very manual process, so more volunteers are always welcome to help out!
We’re going to be really seeing the rewards from all this effort as OCaml 4.02 development proceeds, since we can now adopt a data-driven approach to changing language features instead of guessing how much third-party code will break. If your code is in OPAM, then it’ll be tested as new features such as module aliases, injectivity and extension points show up.
The venerable OCamlDoc tool has done an admirable job for the last decade, but is increasingly showing its age due to a lack of support for cross-referencing across packages. We started working on this problem in the summer when Vincent Botbol visited us on an internship, expecting it to be a quick job to come up with something as good as Haskell’s excellent Haddock online documentation.
Instead, we ran into the “module wall”: since OCaml makes it so easy to
parameterise code over other modules, it makes it hard to generate
static documentation without outputting hundreds of megabytes of HTML
every time. After some hard work from Vincent and Leo, we’ve got a
working prototype that lets you simply run
opam install opam-doc && opam doc core async to generate package
documentation. You can see the results for
Mirage online, but expect to see this
integrated into the main OCaml site for all OPAM packages as we work
through polishing up the user interface.
The other behind-the-scenes effort for OPAM has been to keep the core command-line tool simple and stable, and to have it install OCaml libraries that can be interfaced with by other tools to do domain-specific tasks. Thomas Gazagnaire, Louis Gesbert and David Sheets have been steadily hacking away at this and we now have opamfu to run operations over all packages, and an easy-to-template opam2web that generates the live opam.ocaml.org website.
This makes OPAM easier to deploy within other organizations that want to
integrate it into their workflow. For example, the software
section of the OCaml
Labs website is regularly generated from a search of all OPAM packages
ocamllabs. We also used it to rewrite the entire OPAM
repository in one epic
diff to add
external library dependencies via a command-line
All of this effort is geared towards making it easier to maintain reusable local OPAM installations. After several requests from big universities to help out their teaching needs, we’re putting together all the support needed to easily redistribute OPAM packages via an “OPAM-in-a-Box” command that uses Docker containers to let you clone and do lightweight modifications of OCaml installations.
This will also be useful for anyone who’d like to run tutorials or teach OCaml, without having to rely on flaky network connectivity at conference venues: a problem we’ve suffered from too!
Starting to work on a real compiler can often be a daunting prospect, and so one initiative we started this year is to host regular compiler hacking sessions where people could find a curated list of features to work on, with the regular developers at hand to help out when people get stuck, and free beer and pizza to oil the coding wheels. This has worked out well, with around 20 people showing up on average for the three we held, and several patches submitted upstream to OCaml. Gabriel Scherer and Damien Doligez have been helping this effort by tagging junior jobs in the OCaml Mantis bug tracker as they are filed.
Leo White started the year fresh out of completing his PhD with Alan Mycroft, and before he realized what he’d gotten himself into was working with Alain Frisch on the future of syntax transformations in OCaml. We started off our first wg-camlp4 working group on the new lists.ocaml.org host, and a spirited discussion started that went on and on for several months. It ended with a very satisfying design for a simpler extension points mechanism which Leo presented at the OCaml 2013 workshop at ICFP, and is now merged into OCaml 4.02-trunk.
Not all of the working groups were quite as successful in coming to a conclusion as the Camlp4 one. On the Platform mailing list, Gabriel Scherer started a discussion on the design for namespaces in OCaml. The resulting discussion was useful in separating multiple concerns that were intermingled in the initial proposal, and Leo wrote a comprehensive blog post on a proposed namespace design.
After further discussion at ICFP 2013 with Jacques Garrigue later in the year, it turns out adding support for module aliases would solve much of the cost associated with compiling large libraries such as Core, with no backwards compatibility issues. This solution has now been integrated into OCaml 4.02.0dev and is being tested with Core.
Jeremy Yallop joined us in April, and he and Leo also leapt into the core compiler and started triaging issues on the OCaml bug tracker. This seems unglamorous in the beginning, but there rapidly turned out to be many fascinating threads that shed light on OCaml’s design and implementation through seemingly harmless bugs. Here is a pick of some interesting threads through the year that we’ve been involved with:
opam switch 4.00.1+open-types.
This is just a sample of some of the issues solved in Mantis; if you want to learn more about OCaml, it’s well worth browsing through it to learn from over a decade of interesting discussions from all the developers.
While OCamlPro was working on their reentrant OCaml runtime, we took a different tack by adding thread-local storage to the runtime instead, courtesy of Stephen Dolan. This is an important choice to make at the outset of adding multicore, so both approaches are warranted. The preemptive runtime adds a lot of code churn (due to adding a context parameter to most function calls) and takes up a register, whereas the thread-local storage approach we tried doesn’t permit callbacks to different threads.
Much of this work isn’t interesting on its own, but forms the basis for a fully multicore runtime (with associated programming model) in 2014. Stay tuned!
One other complaint from the Consortium members was quite surprising: the difficulty of using the OCaml foreign function interface safely to interface with C code. Jeremy Yallop began working on the ctypes library that had the goal of eliminating the need to write any C code at all for the vast majority of foreign bindings.
Instead, Ctypes lets you describe any C function call as an OCaml value,
and provides various linkage options to invoke that function into C. The
first option he implemented was a
dlopen interface, which immediately
brought us the same level of functionality as the
equivalents. This early code was in itself startlingly useful and more
pleasant to use than the raw FFI, and various folk (such as David
cryptography bindings) started adopting it.
At this point, I happened to be struggling to write the Foreign Function
Interface chapter of Real World OCaml
without blowing through our page budget with a comprehensive explanation
of the existing system. I decided to take a risk and write about Ctypes
instead, since it let new users to the language have a far more
productive experience to get started. Xavier Leroy pointed out some
shortcomings of the
library in his technical book review, most notably with the lack of an
interface with C macros. The design of Ctypes fully supports alternate
linking mechanisms than just
dlopen though, and Jeremy has added
automatic C stub generation support as well. This means that if you use
Ctypes to build an OCaml binding in 2014, you can choose several
mechanisms for the same source code to link to the external system.
Jeremy even demonstrated a forking model at OCaml 2013 that protects the
OCaml runtime from the C binding via process separation.
The effort is paying off: Daniel Bünzli ported SDL2 using ctypes, and gave us extensive feedback about any missing corner cases, and the resulting bindings don’t require any C code to be written. Jonathan Protzenko even used it to implement an OCaml controller for the Adafruit Raspberry Pi RGB LCD!
Our community efforts were largely online, but we also hosted visitors over the year and regular face-to-face tutorials.
While the rest of the crew were hacking on OPAM and OCaml, Amir Chaudhry and Philippe Wang teamed up with Ashish Agarwal and Christophe Troestler to redesign and relaunch the OCaml website. Historically, OCaml’s homepage has been the caml.inria.fr domain, and the ocaml.org effort was begun by Christophe and Ashish some years ago to modernize the web presence.
The webpages were already rather large with complex scripting (for example, the 99 Problems page runs the OCaml code to autogenerate the output). Philippe developed a template DSL that made it easier to unify a lot of the templates around the website, and also a Markdown parser that we could link to as a library from the rest of the infrastructure without shelling out to Pandoc.
Meanwhile, Amir designed a series of interactive wireframe sketches and gathered feedback on it from the community. A local design agency in Cambridge helped with visual look and feel, and finally at the end of the summer we began the migration to the new website, followed by a triumphant switchover in November to the design you see today.
The domain isn’t just limited to the website itself. Leo and I set up a SVN-to-Git mirror of the OCaml compiler Subversion repository on the GitHub OCaml organization, which is proving popular with developers. There is an ongoing effort to simplify the core compiler tree by splitting out some of the larger components, and so camlp4 is also now hosted on that organization, along with OASIS. We also administer several subdomains of ocaml.org, such as the mailing lists and the OPAM repository, and other services such as the OCaml Forge are currently migrating over. This was made significantly easier thanks to sponsorship from Rackspace Cloud (users of XenServer which is written in OCaml). They saw our struggles with managing physical machines and gave us developer accounts, and all of the ocaml.org infrastructure is now hosted on Rackspace. We’re very grateful to their ongoing help!
If you’d like to contribute to infrastructure help (for example, I’m experimenting with a GitLab mirror), then please join the infrastructurecontacts/lists.ocaml.org mailing list and share your thoughts. The website team also need help with adding content and international translations, so head over to the website issue tracker and start proposing improvements you’d like to see.
The floodgates requesting features opened up after the launch of the new look and feel. Pretty much everyone wanted deeper OPAM integration into the main website, for features such as:
Many of these features were part of the original wireframes but we’re being careful to take a long-term view of how they should be created and maintained. Rather than building all of this as a huge bloated opam2web extension, David Sheets (our resident relucant-to-admit-it web expert) has designed an overlay directory scheme that permits the overlaying of different metadata onto the website. This lets one particular feature (such as blog post aggregation) be handled separately from the others via Atom aggregators.
A big effort that took up most of the year for me was finishing and publishing an O’Reilly book called Real World OCaml with Yaron Minsky and Jason Hickey. Yaron describes how it all started in his blog post, but I learnt a lot from developing a book using the open commenting scheme that we developed just for this.
In particular, the book ended up shining a bright light into dark language corners that we might otherwise not have explored in OCaml Labs. Two chapters of the book that I wasn’t satisfied with were the objects and classes chapters, largely since neither Yaron nor Jason nor I had ever really used their full power in our own code. Luckily, Leo White decided to pick up the baton and champion these oft-maligned (but very powerful) features of OCaml, and the result is the clearest explanation of them that I’ve read yet. Meanwhile, Jeremy Yallop helped out with extensive review of the Foreign Function Interface chapter that used his ctypes library. Finally, Jeremie Diminio at Jane Street worked hard on adding several features to his utop toplevel that made it compelling enough to become our default recommendation for newcomers.
All in all, we ended up closing over 2000 comments in the process of writing the book, and I’m very proud of the result (freely available online, but do buy a copy if you can to support it). Still, there’s more I’d like to do in 2014 to improve the ease of using OCaml further. In particular, I removed a chapter on packaging and build systems since I wasn’t happy with its quality, and both Thomas Gazagnaire and I intend to spend time in 2014 on improving this part of the ecosystem.
We had a lively presence at ICFP 2013 this year, with the third iteration of the OCaml 2013 held there, and Stephen Dolan presenting a paper in the main conference. I liveblogged OCaml 2013 and CUFP 2013 as they happened, and all the talks we gave are linked from the program. The most exciting part of the conference for a lot of us were the two talks by Facebook on their use of OCaml: first for program analysis using Pfff and then to migrate their massive PHP codebase using an OCaml compiler. I also had the opportunity to participate in a panel at the Haskell Workshop on whether Haskell is too big to fail yet; lots of interesting perspectives on scaling another formerly academic language into the real world.
Yaron Minsky and I have been giving tutorials on OCaml at ICFP for several years, but the release of Real World OCaml has made it significantly easier to give tutorials without the sort of labor intensity that it took in previous years (one memorable ICFP 2011 tutorial that we did took almost 2 hours to get everyone installed with OCaml. In ICFP 2013, it took us 15 minutes or so to get everyone started). Still, giving tutorials at ICFP is very much preaching to the choir, and so we’ve started speaking at more general-purpose events.
Since OCaml Labs is a normal group within the Cambridge Computer Lab, we often host academic visitors and interns who pass through. This year was certainly diverse, and we welcomed a range of colleagues:
We were also visited several times by Wojciech Meyer from ARM, who was an OCaml developer who maintained (among other things) the ocamlbuild system and worked on DragonKit (an extensible LLVM-like compiler written in OCaml). Wojciech very sadly passed away on November 18th, and we all fondly remember his enthusiastic and intelligent contributions to our small Cambridge community.
We also hosted visitors to live in Cambridge and work with us over the summer. In addition to Vincent Botbol (who worked on OPAM-doc as described earlier) we had the pleasure of having Daniel Bünzli and Xavier Clerc work here. Here’s what they did in their own words.
Xavier Clerc took a break from his regular duties at INRIA to join us over the summer to work on OCaml-Java and adapt it to the latest JVM features. This is an incredibly important project to bridge OCaml with the huge Java community, and here’s his report:
After a four-month visit to the OCaml Labs dedicated to the OCaml-Java project, the time has come for an appraisal! The undertaken work can be split into two areas: improvements to code generation, and interaction between the OCaml & Java languages. Regarding code generation, several classical optimizations have been added to the compiler, for example loop unrolling, more aggressive unboxing, better handling of globals, or partial evaluation (at the bytecode level). A new tool, namely ocamljar, has been introduced allowing post-compilation optimizations. The underlying idea is that some optimizations cannot always be applied (e.g. depending whether multiple threads/programs will coexist), but enabling them through command-line flags would lead to recompilation and/or multiple installations of each library according to the set of chosen optimizations. It is thus far more easier to first build an executable jar file, and then modify it according to these optimizations. Furthermore, this workflow allows the ocamljar tool to take advantage of whole-program information for some optimizations. All these improvements, combined, often lead to a gain of roughly 1/3 in terms of execution time.
Regarding language interoperability, there are actually two directions depending on whether you want to call OCaml code from Java, or want to call Java code from OCaml. For the first direction, a tool allows to generate Java source files from OCaml compiled interfaces, mapping the various constructs of the OCaml language to Java classes. It is then possible to call functions, and to manipulate instances of OCaml types in pure Java, still benefiting from the type safety provided by the OCaml language. In the other direction, an extension of the OCaml typer is provided allowing to create and manipulate Java instances directly from OCaml sources. This typer extension is indeed a thin layer upon the original OCaml typer, that is mainly responsible for encoding Java types into OCaml types. This encoding uses a number of advanced elements such as polymorphic variants, subtyping, variance annotations, phantom typing, and printf-hack, but the end-user does not have to be aware of this encoding. On the surface, the type of instances of the Java Object classes is
java'lang'Object java_instance, and instances can be created by calling Java.make
Daniel joined us from Switzerland, and spent some time at Citrix before joining us in OCaml Labs. All of his software is now on OPAM, and is seeing ever-increasing adoption from the community.
Released a first version of Vg […] I’m especially happy about that as I wanted to use and work on these ideas since at least 2008. The project is a long term project and is certainly not finished yet but this is already a huge step.
Adjusted and released a first version of Gg. While the module was already mostly written before my arrival to Cambridge, the development of Vg and Vz prompted me to make some changes to the module.
[…] released Otfm, a module to decode OpenType fonts. This is a work in progress as not every OpenType table has built-in support for decoding yet. But since it is needed by Vg’s PDF renderer I had to cut a release. It can however already be used to implement certain simple things like font kerning with Vg, this can be seen in action in the
vechobinary installed by Vg.
Started to work on Vz, a module for helping to map data to Vg images. This is really unfinished and is still considered to be at a design stage. There are a few things that are however well implemented like (human) perceptually meaningful color palettes and the small folding stat module (
Vz.Stat). However it quickly became evident that I needed to have more in the box w.r.t. text rendering in Vg/Otfm. Things like d3js entirely rely on the SVG/CSS support for text which makes it easy to e.g. align things (like tick labels on such drawings). If you can’t rely on that you need ways of measuring rendered text. So I decided to suspend the work on Vz and put more energy in making a first good release of Vg. Vz still needs quite some design work, especially since it tries to be independent of Vg’s backend and from the mechanism for user input.
Spent some time figuring out a new “opam-friendly” release workflow in pkgopkg. One of my problem is that by designing in the small for programming in the large — what a slogan — the number of packages I’m publishing is growing (12 and still counting). This means that I need to scale horizontally maintenance-wise unhelped by the sad state of build systems for OCaml. I need tools that make the release process flawless, painless and up to my quality standards. This lead me to enhance and consolidate my old scattered distribution scripts in that repo, killing my dependencies on Oasis and ocamlfind along the way. (edited for brevity, see here)
Daniel also left his bicycle here for future visitors to use, and the “Bünzli-bike” is available for our next visitor! (Louis Gesbert even donated lights, giving it a semblance of safety).
Most of our regular funding bodies such as EPSRC or EU FP7 provide funding, but leave all the intellectual input to the academics. A compelling aspect of OCaml Labs has been how involved our industrial colleagues have been with the day-to-day problems that we solve. Both Jane Street and Citrix have senior staff regularly visiting our group and working alongside us as industrial fellows in the Computer Lab.
The other 100% of our time at the Labs is spent on research projects. When we started the group, I wanted to set up a feedback loop between local people using OCaml to build systems, with the folk developing OCaml itself. This has worked out particularly well with a couple of big research projects in the Lab.
Mirage is a library operating system written in OCaml that compiles source code into specialised Xen microkernels, developed at the Cambridge Computer Lab, Citrix and the Horizon Digital Economy institute at Nottingham. This year saw several years of effort culminate in the first release of Mirage 1.0 as a self-hosting entity. While Mirage started off as a quick experiment into building specialised virtual appliances, it rapidly became useful to make into a real system for use in bigger research projects. You can learn more about Mirage here, or read the Communications of the ACM article that Dave Scott and I wrote to close out the year.
This project is where the OCaml Labs “feedback loop” has been strongest. A typical Mirage application consists of around 50 libraries that are all installed via OPAM. These range from device drivers to protocol libraries for HTTP or DNS, to filesystems such as FAT32. Coordinating regular releases of all of these would be near impossible without using OPAM, and has also forced us to use our own tools daily, helping to sort out bugs more quickly. You can see the full list of libraries on the OCaml Labs software page.
Mirage is also starting to share code with big projects such as XenServer now, and we have been working with Citrix engineers to help them to move to the Core library that Jane Street has released (and that is covered in Real World OCaml). Moving production codebases this large can take years, but OCaml Labs is turning out to be a good place to start unifying some of the bigger users of OCaml into one place. We’re also now an official Xen Project incubator project, which helps us to validate functional programming to other Linux Foundation efforts.
The release of Mirage 1.0 has put us on the road to simplifying embedded systems programming. The move to the centralized cloud has led to regular well-publicised privacy and security threats to the way we handle our digital infrastructure, and so Jon Crowcroft, Richard Mortier and I are leading an effort to build an alternative privacy-preserving infrastructure using embedded devices as part of the User Centric Networking project, in collaboration with a host of companies led by Technicolor Paris. This work also plays on the strong points of OCaml: it already has a fast ARM backend, and Mirage can easily be ported to the new Xen/ARM target as hardware becomes available.
One of the most difficult aspects of programming on the “wide area” Internet are dealing with the lack of a distributed identity service that’s fully secure. We published our thoughts on this at the USENIX Free and Open Communications on the Internet workhsop, and David Sheets is working towards a full implementation using Mirage. If you’re interested in following this effort, Amir Chaudhry is blogging at the Nymote project website, where we’ll talk about the components as they are released.
At the other extreme from embedded programming is datacenter networking, and we started the Network-as-a-Service research project with Imperial College and Nottingham. With the rapid rise of Software Defined Networking this year, we are investigating how application-specific customisation of network resources can build fast, better, cheaper infrasructure. OCaml is in a good position here: several other groups have built OpenFlow controllers in OCaml (most notably, the Frenetic Project), and Mirage is specifically designed to assemble such bespoke infrastructure.
Another aspect we’ve been considering is how to solve the problem of optimal connectivity across nodes. TCP is increasingly considered harmful in high-through, high-density clusters, and George Parisis led the design of Trevi, which is a fountain-coding based alternative for storage networking. Meanwhile, Thomas Gazagnaire (who joined OCaml Labs in November), has been working on a branch-consistent data store called Irminsule which supports scalable data sharing and reconciliation using Mirage. Both of these systems will see implementations based on the research done this year.
Jeremy Yallop and Leo White have been developing an approach that makes it possible to write programs with higher-kinded polymorphism (such as monadic functions that are polymorphic in the monad they use) without using functors. It’s early days yet, but there’s a library available on OPAM that implements the approach, and a draft paper that outlines the design.
This year has been a wild ride to get us up to speed, but we now have a solid sense of what to work on for 2014. We’ve decided on a high-level set of priorities led by the senior members of the group:
These are guidelines to choosing where to spend our time, but not excluding other work or day-to-day bugfixing. Our focus on collaboration with Jane Street, Citrix, Lexifi, OCamlPro and our existing colleagues will continue, as well as warmly welcoming new community members that wish to work with us on any of the projects, either via internships, studentships or good old-fashioned open source hacking.
I appreciate the whole team’s feedback in editing this long post into shape, the amazing professorial support from Jon Crowcroft, Ian Leslie and Alan Mycroft throughout the year, and of course the funding and support from Jane Street, Citrix, RCUK, EPSRC, DARPA and the EU FP7 that made all this possible. Roll on 2014, and please do get in touch with me with any queries!