home · projects · papers · blog · gallery · contact
anil madhavapeddy // anil.recoil.org

Grepping the source of every OCaml package in OPAM

08 April 2014   |   Anil Madhavapeddy   |   tags: opam,ocaml,ocamllabs   |   all posts

A regular question that comes up from OCaml developers is how to use OPAM as a hypothesis testing tool against the known corpus of OCaml source code. In other words: can we quickly and simply run grep over every source archive in OPAM? So that’s the topic of today’s 5 minute blog post:

git clone git://github.com/ocaml/opam-repository
cd opam-repository
opam-admin make
cd archives
for i in *.tar.gz; \
  do tar -zxOf $i | grep caml_stat_alloc_string; \
done

In this particular example we’re looking for instances of caml_stat_alloc_string, so just replace that with the regular expression of your choice. The opam-admin tool repacks upstream archives into a straightforward tarball, so you don’t need to worry about all the different archival formats that OPAM supports (such as git or Darcs). It just adds an archive directory to a normal opam-repository checkout, so you can reuse an existing checkout if you have one already.

$ cd opam-repository/archives
$ du -h
669M	.
$ ls | wc -l
   2092

Codio, the insanely slick web way to build Mirage unikernels from a browser

26 March 2014   |   Anil Madhavapeddy   |   tags: mirage,ocaml,ocamllabs   |   all posts

I noticed an offhand tweet from Phil Tomson about Codio adding OPAM support, and naturally had to take a quick look. I was really impressed by the whole process, and ended up building the Mirage Xen website unikernel directly from my web browser in less than a minute, including registration!

I notice Codio supports OCaml and opam on the server side now.

— phil tomson (@philtor) March 26, 2014
$ parts install opam
$ opam init -a
$ eval `opam config env`
$ opam install mirage-www -y
$ make MODE=xen

Then have a cup of coffee while the box builds, and you have a mir-www.xen, all from your web browser! Codio has a number of deployment options available too, so you should be able to hook up a Git-based workflow using some combination of Travis or other CI service.

This is the first time I’ve ever been impressed by an online editor, and might consider moving away from my beloved vi…


Easily OPAM switching to any OCaml feature request

25 March 2014   |   Anil Madhavapeddy   |   tags: ocaml,ocamllabs   |   all posts

Gabriel Scherer announced an experiment to host OCaml compiler pull requests on GitHub for six months. There is a general feeling that GitHub would be a more modern hosting platform than the venerable but reliable Mantis setup that has in place for over a decade, but the only way to find out for sure is by trying it out for a while.

One of the great benefits of using GitHub is their excellent API to easily automate workflows around issues and pull requests. After a suggestion from Jeremy Yallop and David Sheets over lunch, I decided to use this to make it easier to locally apply compiler patches. OPAM has a great compiler switch feature that lets you run simultaneous OCaml installations and swap between them easily. For instance, the default setting gives you access to:

$ opam switch
system  C system       System compiler (4.01.0)
--     -- 3.11.2       Official 3.11.2 release
--     -- 3.12.1       Official 3.12.1 release
--     -- 4.00.0       Official 4.00.0 release
--     -- 4.00.1       Official 4.00.1 release
--     -- 4.01.0       Official 4.01.0 release
--     -- 4.01.0beta1  Beta1 release of 4.01.0

I used my GitHub API bindings to knock up a script that converts every GitHub pull request into a custom compiler switch. You can see these by passing the --all option to opam switch, as follows:

$ opam switch --all
--     -- 4.02.0dev+pr10              Add String.{split,rsplit}
--     -- 4.02.0dev+pr13              Add String.{cut,rcut}.
--     -- 4.02.0dev+pr14              Add absolute directory names to bytecode format for ocamldebug to use
--     -- 4.02.0dev+pr15              replace String.blit by String.unsafe_blit
--     -- 4.02.0dev+pr17              Cmm arithmetic optimisations
--     -- 4.02.0dev+pr18              Patch for issue 5584
--     -- 4.02.0dev+pr2               Parse -.x**2. (unary -.) as -.(x**2.).  Fix PR#3414
--     -- 4.02.0dev+pr20              OCamlbuild: Fix the check of ocamlfind
--     -- 4.02.0dev+pr3               Extend record punning to allow destructuring.
--     -- 4.02.0dev+pr4               Fix for PR#4832 (Filling bigarrays may block out runtime)
--     -- 4.02.0dev+pr6               Warn user when a type variable in a type constraint has been instantiated.
--     -- 4.02.0dev+pr7               Extend ocamllex with actions before refilling
--     -- 4.02.0dev+pr8               Adds a .gitignore to ignore all generated files during `make world.opt'
--     -- 4.02.0dev+pr9               FreeBSD 10 uses clang by default, with gcc not available by default
--     -- 4.02.0dev+trunk             latest trunk snapshot

Testing the impact of a particular compiler switch is now pretty straightforward. If you want to play with Stephen Dolan’s optimized arithmetic operations, for instance, you just need to do:

$ opam switch 4.02.0dev+pr17
$ eval `opam config env`

And your local environment now points to the patched OCaml compiler. For the curious, the scripts to generate the OPAM pull requests are in my avsm/opam-sync-github-prs repository. It contains an example of how to query active pull requests, and also to create a new cross-repository pull request (using the git jar binary from my GitHub bindings). The scripts run daily for now, and delete switches once the corresponding pull request is closed. Just run opam update to retrieve the latest switch set from the upstream OPAM package repository.


ICFP 2014 - a call for sponsorship and how you can help

03 March 2014   |   Anil Madhavapeddy   |   tags: icfp   |   all posts

The call for papers for this year’s International Conference on Functional Programming has just closed, with around a hundred cutting edge research papers submitted on the theory, application and experiences behind functional programming. This marks just the beginning of sorting out the program, as there are also over 10 big affiliated workshops that run throughout the week on topics ranging from specific languages (Erlang, Haskell, OCaml), the broader commercial community, and even art and music.

The ICFP conference experience can be a remarkable one for students. Some great ideas have emerged from random corridor conversations between talks with the likes of Phil Wadler, or from rain-soaked discussions with Simon PJ at Mikeller, or in my case, from being convinced to write a book while in a smoky Tokyo bar.

Functional programming worldwide has been growing ever more popular in 2014 (and lucrative). We’re committed to growing the ICFP community, not just in numbers but also in diversity. We had a record number of sponsors in 2013, and sustaining the growth means that we need to reach ever wider to support the activities of the (not-for-profit) conference.

So as this year’s industrial relations chair, I thought I’d throw the gates open and invite any organization that wishes to support FP to get in touch with us (e-mail at avsm2@cl.cam.ac.uk) and sponsor us. I’ve put an abridged version of the e-mail solicitation below that describes the benefits. Sponsorship can start as low as $500 and is often tax deductible in many countries.

I’m writing to ask if you would be willing to provide corporate financial support for the 19th ACM SIGPLAN International Conference on Functional Programming (ICFP), which takes place in Gothenburg, Sweden, from September 1st through 3rd, 2014:

http://icfpconference.org/icfp2014/

Corporate support funds are primarily used to subsidize students – the lifeblood of our community – and in turn serve to raise the community profile of the supporting companies through a high-profile industrial recruitment event.

Last year, unprecedented levels of support from you and folks like you at over 25 companies and institutions made it possible for students from all over the world to attend ICFP 2013 in Boston. The Industrial Reception, open to all attendees, was by all accounts a roaring success. All 2013 sponsoring companies had the opportunity to speak to the gathered students, academics, and software professionals.

This year, let’s build on that success and continue to grow our community, and bring even more students to ICFP 2014 in Sweden!

Your generosity will make it possible for students from all over the world to attend ICFP, the premier conference in functional programming. There, they will meet luminaries in the field, as well as people who’ve built a successful career and/or business on functional programming. They will return home inspired to continue pursuing functional programming in the confidence that exciting future careers await them.

This year, we’re continuing similar system of levels of financial support as last year. Our goal is to enable smaller companies to contribute while allowing larger companies to be as generous as they wish (with additional benefits, in recognition of that generosity).

The support levels, and their associated benefits and pledge amounts and benefits are as follows (costs in US dollars).

Bronze: $500: Logo on website, poster at industrial reception, listed in proceedings.

Silver: $2500: As above plus: logo in proceedings, logo on publicity materials (e.g., posters, etc.)

Gold: $5000: As above plus: named supporter of industrial reception, opportunity to include branded merchandise in participants’ swag bag.

Platinum: $10000: As above plus: named supporter of whole event, logo on lanyards, badge ribbon, table/booth-like space available (in coffee break areas), other logo on lanyards, badge ribbon, table/booth-like space available (in coffee break areas), other negotiated benefits (subject to ACM restrictions on commercial involvement).

If you are interested, please get in touch with me or any of the organizing committee. If you’re interested in helping out ICFP in a non-financial capacity (for example as a student volunteer), then there will also be plenty of opportunity to sign up later in the year.


Unikernels, and the Rise of the Virtual Library Operating System

13 January 2014   |   Anil Madhavapeddy   |   tags: ocaml,ocamllabs.mirage   |   all posts

The Communications of the ACM have just published an article that Dave Scott and I wrote providing a broader background on the concept of Unikernels that we’ve been working on since about 2003, when we started building Melange and the Xen toolstack. You can read either the print article (requires an ACM subscription) or the open access version on the ACM Queue. There’s been some interesting discussion about it already online:

Two of the most interesting bits of feedback for me personally came from Butler Lampson (via Jon Crowcroft) and Robert Harper, two computer scientists who have made key contributions to operating systems and programming languages and provided some broader perspective.

Butler Lampson points out (edited for the web):

I found the Mirage work quite interesting: a 21st century version of things that we did at Xerox in the 1970s. Of course the application domain is quite different, and so is the whole-program optimization. And we couldn’t afford garbage collection, so freeing storage was not type-safe. But there are lots of interesting parallels.

The “OS as libraries” idea was what made it possible to fit big applications into the Alto’s 128k bytes of memory:

Lampson and Sproull, An open operating system for a single-user machine, ACM Operating Systems Rev. 11, 5 (Dec. 1979), pp 98-105. ACM.

The use of strong type-checking and interfaces for an OS was pioneered in Mesa and Pilot:

Lauer and Satterthwaite, The impact of Mesa on system design, Proc. 4th ICSE, Munich, Sep. 1979, pp 174-182.

Redell et al, Pilot: An Operating System for a Personal Computer, Comm. ACM 23, 2 (Feb 1980), pp 81-92 (from 7th SOSP, 1979). ACM

Robert Harper correctly points out some related work that was missing from our CACM article:

Both Ensemble and FoxNet made strong echoes throughout the design of Mirage (and its precursor software such as Melange in 2007). The Mirage command-line tool uses staged-computation to build a concrete application out of functors, and we are making this even more programmable via a new combinator-based functor types library that Thomas Gazagnaire built, and also experimenting with higher kinded polymorphic abstractions.

My thanks to Butler Lampson and Robert Harper for making me go re-read their papers again, and I’d like to leave you with Malte Schwarzkopf’s OS Reading Group papers for other essential reading in this space. Many more citations immediately relevant to Mirage can also be found in our ASPLOS 2013 paper.


Reviewing the first year of OCaml Labs in 2013

29 December 2013   |   Anil Madhavapeddy   |   tags: ocamllabs,ocaml   |   post syndicated from OCaml Labs   |   all posts

This time last year in 2012, I had just announced the formation of a new group called OCaml Labs in the Cambridge Computer Lab that would combine research and community work towards the practical application of functional programming. An incredible year has absolutely flown by, and I’ve put together this post to summarise what’s gone on, and point to our future directions for 2014.

The theme of our group was not to be pure research, but rather a hybrid group that would take on some of the load of day-to-day OCaml maintenance from INRIA, as well as help grow the wider OCaml community. To this end, all of our projects have been highly collaborative, often involving colleagues from OCamlPro, INRIA, Jane Street, Lexifi and Citrix.

This post covers progress in tooling, the compiler and language, community efforts, research projects and concludes with our priorities for 2014.

Tooling

At the start of 2013, OCaml was in the interesting position of being a mature decades-old language with a small, loyal community of industrial users who built mission critical applications using it. We had the opportunity to sit down with many of them at the OCaml Consortium meeting and prioritise where we started work. The answer came back clearly: while the compiler itself is legendary for its stability, the tooling around it (such as package management) was a pressing problem.

OPAM

Our solution to this tooling was centered around the OPAM package manager that OCamlPro released into beta just at the end of 2012, and had its first stable release in March 2013. OPAM differs from most system package managers by emphasising a flexible distributed workflow that uses version constraints to ensure incompatible libraries aren’t mixed up (important for the statically-typed OCaml that is very careful about dependencies). Working closely with OCamlPro we developed a git-based workflow to make it possible for users (both individual or industrial) to easily build up their own package repositories and redistribute OCaml code, and started curating the package repository.

The results have been satisfying: we started with an initial set of around 100 packages in OPAM (mostly imported by the 4 developers), and ended 2013 with 587 unique packages and 2000 individual versions, with contributions from 160 individuals. We now have a curated central package repository for anyone to submit their OCaml code, several third-party remotes are maintained (e.g. the Xen Project and Ocsigen). We also regularly receive releases of the Core libraries from Jane Street, and updates from sources as varied as Facebook, Coherent PDF, to the Frenetic SDN research.


Number of unique contributors to the central OPAM package repository.

Total number of unique packages (including multiple versions of the same package).

Total packages with multiple versions coalesced so you can see new package growth.

A notable contribution from OCamlPro during this time was to clarify the licensing on the package repository to be the liberal CC0, and also to pass ownership to the OCaml organization on GitHub, where it’s now jointly maintained by OCaml Labs, OCamlPro and anyone else that wishes to contribute.

A lens into global OCaml code

It’s been quite interesting just watching all the varied code fly into the repository, but stability quickly became a concern as the new packages piled up. OCaml compiles to native code on not just x86, but also PowerPC, Sparc and ARM CPUs. We kicked off various efforts into automated testing: firstly David Sheets built the OCamlot daemon that would schedule builds across all the exotic hardware. Later in the year, the Travis service launched support for testing from GitHub pull requests, and this became the front line of automated checking for all incoming new packages to OPAM.

A major headache with automated testing is usually setting up the right build environment with external library dependencies, and so we added Docker support to make it easier to bulk-build packages for local developer use, with the results of builds available publically for anyone to help triage. Unfortunately fixing the bugs themselves is still a very manual process, so more volunteers are always welcome to help out!

We’re going to be really seeing the rewards from all this effort as OCaml 4.02 development proceeds, since we can now adopt a data-driven approach to changing language features instead of guessing how much third-party code will break. If your code is in OPAM, then it’ll be tested as new features such as module aliases, injectivity and extension points show up.

Better documentation

The venerable OCamlDoc tool has done an admirable job for the last decade, but is increasingly showing its age due to a lack of support for cross-referencing across packages. We started working on this problem in the summer when Vincent Botbol visited us on an internship, expecting it to be a quick job to come up with something as good as Haskell’s excellent Haddock online documentation.

Instead, we ran into the “module wall”: since OCaml makes it so easy to parameterise code over other modules, it makes it hard to generate static documentation without outputting hundreds of megabytes of HTML every time. After some hard work from Vincent and Leo, we’ve got a working prototype that lets you simply run opam install opam-doc && opam doc core async to generate package documentation. You can see the results for Mirage online, but expect to see this integrated into the main OCaml site for all OPAM packages as we work through polishing up the user interface.

Turning OPAM into libraries

The other behind-the-scenes effort for OPAM has been to keep the core command-line tool simple and stable, and to have it install OCaml libraries that can be interfaced with by other tools to do domain-specific tasks. Thomas Gazagnaire, Louis Gesbert and David Sheets have been steadily hacking away at this and we now have opamfu to run operations over all packages, and an easy-to-template opam2web that generates the live opam.ocaml.org website.

This makes OPAM easier to deploy within other organizations that want to integrate it into their workflow. For example, the software section of the OCaml Labs website is regularly generated from a search of all OPAM packages tagged ocamllabs. We also used it to rewrite the entire OPAM repository in one epic diff to add external library dependencies via a command-line shim.

OPAM-in-a-Box

All of this effort is geared towards making it easier to maintain reusable local OPAM installations. After several requests from big universities to help out their teaching needs, we’re putting together all the support needed to easily redistribute OPAM packages via an ”OPAM-in-a-Box” command that uses Docker containers to let you clone and do lightweight modifications of OCaml installations.

This will also be useful for anyone who’d like to run tutorials or teach OCaml, without having to rely on flaky network connectivity at conference venues: a problem we’ve suffered from too!

Core Compiler

Starting to work on a real compiler can often be a daunting prospect, and so one initiative we started this year is to host regular compiler hacking sessions where people could find a curated list of features to work on, with the regular developers at hand to help out when people get stuck, and free beer and pizza to oil the coding wheels. This has worked out well, with around 20 people showing up on average for the three we held, and several patches submitted upstream to OCaml. Gabriel Scherer and Damien Doligez have been helping this effort by tagging junior jobs in the OCaml Mantis bug tracker as they are filed.

Syntax transformations and extension points

Leo White started the year fresh out of completing his PhD with Alan Mycroft, and before he realized what he’d gotten himself into was working with Alain Frisch on the future of syntax transformations in OCaml. We started off our first wg-camlp4 working group on the new lists.ocaml.org host, and a spirited discussion started that went on and on for several months. It ended with a very satisfying design for a simpler extension points mechanism which Leo presented at the OCaml 2013 workshop at ICFP, and is now merged into OCaml 4.02-trunk.

Namespaces

Not all of the working groups were quite as successful in coming to a conclusion as the Camlp4 one. On the Platform mailing list, Gabriel Scherer started a discussion on the design for namespaces in OCaml. The resulting discussion was useful in separating multiple concerns that were intermingled in the initial proposal, and Leo wrote a comprehensive blog post on a proposed namespace design.

After further discussion at ICFP 2013 with Jacques Garrigue later in the year, it turns out adding support for module aliases would solve much of the cost associated with compiling large libraries such as Core, with no backwards compatibility issues. This solution has now been integrated into OCaml 4.02.0dev and is being tested with Core.

Delving into the bug tracker

Jeremy Yallop joined us in April, and he and Leo also leapt into the core compiler and started triaging issues on the OCaml bug tracker. This seems unglamorous in the beginning, but there rapidly turned out to be many fascinating threads that shed light on OCaml’s design and implementation through seemingly harmless bugs. Here is a pick of some interesting threads through the year that we’ve been involved with:

This is just a sample of some of the issues solved in Mantis; if you want to learn more about OCaml, it’s well worth browsing through it to learn from over a decade of interesting discussions from all the developers.

Thread-local storage runtime

While OCamlPro was working on their reentrant OCaml runtime, we took a different tack by adding thread-local storage to the runtime instead, courtesy of Stephen Dolan. This is an important choice to make at the outset of adding multicore, so both approaches are warranted. The preemptive runtime adds a lot of code churn (due to adding a context parameter to most function calls) and takes up a register, whereas the thread-local storage approach we tried doesn’t permit callbacks to different threads.

Much of this work isn’t interesting on its own, but forms the basis for a fully multicore runtime (with associated programming model) in 2014. Stay tuned!

Ctypes

One other complaint from the Consortium members was quite surprising: the difficulty of using the OCaml foreign function interface safely to interface with C code. Jeremy Yallop began working on the ctypes library that had the goal of eliminating the need to write any C code at all for the vast majority of foreign bindings.

Instead, Ctypes lets you describe any C function call as an OCaml value, and provides various linkage options to invoke that function into C. The first option he implemented was a dlopen interface, which immediately brought us the same level of functionality as the Python or Haskell Ctypes equivalents. This early code was in itself startlingly useful and more pleasant to use than the raw FFI, and various folk (such as David Sheets’ libsodium cryptography bindings) started adopting it.

At this point, I happened to be struggling to write the Foreign Function Interface chapter of Real World OCaml without blowing through our page budget with a comprehensive explanation of the existing system. I decided to take a risk and write about Ctypes instead, since it let new users to the language have a far more productive experience to get started. Xavier Leroy pointed out some shortcomings of the library in his technical book review, most notably with the lack of an interface with C macros. The design of Ctypes fully supports alternate linking mechanisms than just dlopen though, and Jeremy has added automatic C stub generation support as well. This means that if you use Ctypes to build an OCaml binding in 2014, you can choose several mechanisms for the same source code to link to the external system. Jeremy even demonstrated a forking model at OCaml 2013 that protects the OCaml runtime from the C binding via process separation.

The effort is paying off: Daniel Bünzli ported SDL2 using ctypes, and gave us extensive feedback about any missing corner cases, and the resulting bindings don’t require any C code to be written. Jonathan Protzenko even used it to implement an OCaml controller for the Adafruit Raspberry Pi RGB LCD!

Community Efforts

Our community efforts were largely online, but we also hosted visitors over the year and regular face-to-face tutorials.

Online at OCaml.org

While the rest of the crew were hacking on OPAM and OCaml, Amir Chaudhry and Philippe Wang teamed up with Ashish Agarwal and Christophe Troestler to redesign and relaunch the OCaml website. Historically, OCaml’s homepage has been the caml.inria.fr domain, and the ocaml.org effort was begun by Christophe and Ashish some years ago to modernize the web presence.

The webpages were already rather large with complex scripting (for example, the 99 Problems page runs the OCaml code to autogenerate the output). Philippe developed a template DSL that made it easier to unify a lot of the templates around the website, and also a Markdown parser that we could link to as a library from the rest of the infrastructure without shelling out to Pandoc.

Meanwhile, Amir designed a series of interactive wireframe sketches and gathered feedback on it from the community. A local design agency in Cambridge helped with visual look and feel, and finally at the end of the summer we began the migration to the new website, followed by a triumphant switchover in November to the design you see today.

The domain isn’t just limited to the website itself. Leo and I set up a SVN-to-Git mirror of the OCaml compiler Subversion repository on the GitHub OCaml organization, which is proving popular with developers. There is an ongoing effort to simplify the core compiler tree by splitting out some of the larger components, and so camlp4 is also now hosted on that organization, along with OASIS. We also administer several subdomains of ocaml.org, such as the mailing lists and the OPAM repository, and other services such as the OCaml Forge are currently migrating over. This was made significantly easier thanks to sponsorship from Rackspace Cloud (users of XenServer which is written in OCaml). They saw our struggles with managing physical machines and gave us developer accounts, and all of the ocaml.org infrastructure is now hosted on Rackspace. We’re very grateful to their ongoing help!

If you’d like to contribute to infrastructure help (for example, I’m experimenting with a GitLab mirror), then please join the infrastructure@lists.ocaml.org mailing list and share your thoughts. The website team also need help with adding content and international translations, so head over to the website issue tracker and start proposing improvements you’d like to see.

Next steps for ocaml.org

The floodgates requesting features opened up after the launch of the new look and feel. Pretty much everyone wanted deeper OPAM integration into the main website, for features such as:

Many of these features were part of the original wireframes but we’re being careful to take a long-term view of how they should be created and maintained. Rather than building all of this as a huge bloated opam2web extension, David Sheets (our resident relucant-to-admit-it web expert) has designed an overlay directory scheme that permits the overlaying of different metadata onto the website. This lets one particular feature (such as blog post aggregation) be handled separately from the others via Atom aggregators.

 Real World OCaml

A big effort that took up most of the year for me was finishing and publishing an O’Reilly book called Real World OCaml with Yaron Minsky and Jason Hickey. Yaron describes how it all started in his blog post, but I learnt a lot from developing a book using the open commenting scheme that we developed just for this.

In particular, the book ended up shining a bright light into dark language corners that we might otherwise not have explored in OCaml Labs. Two chapters of the book that I wasn’t satisfied with were the objects and classes chapters, largely since neither Yaron nor Jason nor I had ever really used their full power in our own code. Luckily, Leo White decided to pick up the baton and champion these oft-maligned (but very powerful) features of OCaml, and the result is the clearest explanation of them that I’ve read yet. Meanwhile, Jeremy Yallop helped out with extensive review of the Foreign Function Interface chapter that used his ctypes library. Finally, Jeremie Diminio at Jane Street worked hard on adding several features to his utop toplevel that made it compelling enough to become our default recommendation for newcomers.

All in all, we ended up closing over 2000 comments in the process of writing the book, and I’m very proud of the result (freely available online, but do buy a copy if you can to support it). Still, there’s more I’d like to do in 2014 to improve the ease of using OCaml further. In particular, I removed a chapter on packaging and build systems since I wasn’t happy with its quality, and both Thomas Gazagnaire and I intend to spend time in 2014 on improving this part of the ecosystem.

Tutorials and Talks

We had a lively presence at ICFP 2013 this year, with the third iteration of the OCaml 2013 held there, and Stephen Dolan presenting a paper in the main conference. I liveblogged OCaml 2013 and CUFP 2013 as they happened, and all the talks we gave are linked from the program. The most exciting part of the conference for a lot of us were the two talks by Facebook on their use of OCaml: first for program analysis using Pfff and then to migrate their massive PHP codebase using an OCaml compiler. I also had the opportunity to participate in a panel at the Haskell Workshop on whether Haskell is too big to fail yet; lots of interesting perspectives on scaling another formerly academic language into the real world.

Yaron Minsky and I have been giving tutorials on OCaml at ICFP for several years, but the release of Real World OCaml has made it significantly easier to give tutorials without the sort of labor intensity that it took in previous years (one memorable ICFP 2011 tutorial that we did took almost 2 hours to get everyone installed with OCaml. In ICFP 2013, it took us 15 minutes or so to get everyone started). Still, giving tutorials at ICFP is very much preaching to the choir, and so we’ve started speaking at more general-purpose events.


Julien Verlaguet and Yoann Padioleau show off Pfff code visualisation at Facebook.

Marius Eriksen and Yaron Minsky start a Scala vs OCaml rap battle at the ICFP industrial fair. Maybe.

A successful FPDays tutorial in Cambridge, with all attendees getting a free copy of RWO!

Our first local effort was FPDays in Cambridge, where Jeremy Yallop and Amir Chaudhry ran the tutorial with help from Phillipe Wang, Leo White and David Sheets. The OCaml session there ended up being the biggest one in the entire two days, and Amir wrote up their experiences. One interesting change from our ICFP tutorial is that Jeremy used js_of_ocaml to teach OCaml via JavaScript by building a fun Monty Hall game.

Visitors and Interns

Since OCaml Labs is a normal group within the Cambridge Computer Lab, we often host academic visitors and interns who pass through. This year was certainly diverse, and we welcomed a range of colleagues:

We were also visited several times by Wojciech Meyer from ARM, who was an OCaml developer who maintained (among other things) the ocamlbuild system and worked on DragonKit (an extensible LLVM-like compiler written in OCaml). Wojciech very sadly passed away on November 18th, and we all fondly remember his enthusiastic and intelligent contributions to our small Cambridge community.

We also hosted visitors to live in Cambridge and work with us over the summer. In addition to Vincent Botbol (who worked on OPAM-doc as described earlier) we had the pleasure of having Daniel Bünzli and Xavier Clerc work here. Here’s what they did in their own words.

Xavier Clerc: OCamlJava

Xavier Clerc took a break from his regular duties at INRIA to join us over the summer to work on OCaml-Java and adapt it to the latest JVM features. This is an incredibly important project to bridge OCaml with the huge Java community, and here’s his report:

After a four-month visit to the OCaml Labs dedicated to the OCaml-Java project, the time has come for an appraisal! The undertaken work can be split into two areas: improvements to code generation, and interaction between the OCaml & Java languages. Regarding code generation, several classical optimizations have been added to the compiler, for example loop unrolling, more aggressive unboxing, better handling of globals, or partial evaluation (at the bytecode level). A new tool, namely ocamljar, has been introduced allowing post-compilation optimizations. The underlying idea is that some optimizations cannot always be applied (e.g. depending whether multiple threads/programs will coexist), but enabling them through command-line flags would lead to recompilation and/or multiple installations of each library according to the set of chosen optimizations. It is thus far more easier to first build an executable jar file, and then modify it according to these optimizations. Furthermore, this workflow allows the ocamljar tool to take advantage of whole-program information for some optimizations. All these improvements, combined, often lead to a gain of roughly 1/3 in terms of execution time.

Regarding language interoperability, there are actually two directions depending on whether you want to call OCaml code from Java, or want to call Java code from OCaml. For the first direction, a tool allows to generate Java source files from OCaml compiled interfaces, mapping the various constructs of the OCaml language to Java classes. It is then possible to call functions, and to manipulate instances of OCaml types in pure Java, still benefiting from the type safety provided by the OCaml language. In the other direction, an extension of the OCaml typer is provided allowing to create and manipulate Java instances directly from OCaml sources. This typer extension is indeed a thin layer upon the original OCaml typer, that is mainly responsible for encoding Java types into OCaml types. This encoding uses a number of advanced elements such as polymorphic variants, subtyping, variance annotations, phantom typing, and printf-hack, but the end-user does not have to be aware of this encoding. On the surface, the type of instances of the Java Object classes is java'lang'Object java_instance, and instances can be created by calling Java.make Object().

While still under heavy development, a working prototype is available, and bugs can be reported. Finally, I would like to thank the OCaml Labs for providing a great working environment.

Daniel Bünzli: Typography and Visualisation

Daniel joined us from Switzerland, and spent some time at Citrix before joining us in OCaml Labs. All of his software is now on OPAM, and is seeing ever-increasing adoption from the community.

Released a first version of Vg I’m especially happy about that as I wanted to use and work on these ideas since at least 2008. The project is a long term project and is certainly not finished yet but this is already a huge step.

Adjusted and released a first version of Gg. While the module was already mostly written before my arrival to Cambridge, the development of Vg and Vz prompted me to make some changes to the module.

released Otfm, a module to decode OpenType fonts. This is a work in progress as not every OpenType table has built-in support for decoding yet. But since it is needed by Vg’s PDF renderer I had to cut a release. It can however already be used to implement certain simple things like font kerning with Vg, this can be seen in action in the vecho binary installed by Vg.

Started to work on Vz, a module for helping to map data to Vg images. This is really unfinished and is still considered to be at a design stage. There are a few things that are however well implemented like (human) perceptually meaningful color palettes and the small folding stat module (Vz.Stat). However it quickly became evident that I needed to have more in the box w.r.t. text rendering in Vg/Otfm. Things like d3js entirely rely on the SVG/CSS support for text which makes it easy to e.g. align things (like tick labels on such drawings). If you can’t rely on that you need ways of measuring rendered text. So I decided to suspend the work on Vz and put more energy in making a first good release of Vg. Vz still needs quite some design work, especially since it tries to be independent of Vg’s backend and from the mechanism for user input.

Spent some time figuring out a new “opam-friendly” release workflow in pkgopkg. One of my problem is that by designing in the small for programming in the large — what a slogan — the number of packages I’m publishing is growing (12 and still counting). This means that I need to scale horizontally maintenance-wise unhelped by the sad state of build systems for OCaml. I need tools that make the release process flawless, painless and up to my quality standards. This lead me to enhance and consolidate my old scattered distribution scripts in that repo, killing my dependencies on Oasis and ocamlfind along the way. (edited for brevity, see here)

Daniel also left his bicycle here for future visitors to use, and the “Bünzli-bike” is available for our next visitor! (Louis Gesbert even donated lights, giving it a semblance of safety).

Industrial Fellows

Most of our regular funding bodies such as EPSRC or EU FP7 provide funding, but leave all the intellectual input to the academics. A compelling aspect of OCaml Labs has been how involved our industrial colleagues have been with the day-to-day problems that we solve. Both Jane Street and Citrix have senior staff regularly visiting our group and working alongside us as industrial fellows in the Computer Lab.

Research Projects

The other 100% of our time at the Labs is spent on research projects. When we started the group, I wanted to set up a feedback loop between local people using OCaml to build systems, with the folk developing OCaml itself. This has worked out particularly well with a couple of big research projects in the Lab.

 Mirage

Mirage is a library operating system written in OCaml that compiles source code into specialised Xen microkernels, developed at the Cambridge Computer Lab, Citrix and the Horizon Digital Economy institute at Nottingham. This year saw several years of effort culminate in the first release of Mirage 1.0 as a self-hosting entity. While Mirage started off as a quick experiment into building specialised virtual appliances, it rapidly became useful to make into a real system for use in bigger research projects. You can learn more about Mirage here, or read the Communications of the ACM article that Dave Scott and I wrote to close out the year.

This project is where the OCaml Labs “feedback loop” has been strongest. A typical Mirage application consists of around 50 libraries that are all installed via OPAM. These range from device drivers to protocol libraries for HTTP or DNS, to filesystems such as FAT32. Coordinating regular releases of all of these would be near impossible without using OPAM, and has also forced us to use our own tools daily, helping to sort out bugs more quickly. You can see the full list of libraries on the OCaml Labs software page.

Mirage is also starting to share code with big projects such as XenServer now, and we have been working with Citrix engineers to help them to move to the Core library that Jane Street has released (and that is covered in Real World OCaml). Moving production codebases this large can take years, but OCaml Labs is turning out to be a good place to start unifying some of the bigger users of OCaml into one place. We’re also now an official Xen Project incubator project, which helps us to validate functional programming to other Linux Foundation efforts.

Nymote and User Centric Networking

The release of Mirage 1.0 has put us on the road to simplifying embedded systems programming. The move to the centralized cloud has led to regular well-publicised privacy and security threats to the way we handle our digital infrastructure, and so Jon Crowcroft, Richard Mortier and I are leading an effort to build an alternative privacy-preserving infrastructure using embedded devices as part of the User Centric Networking project, in collaboration with a host of companies led by Technicolor Paris. This work also plays on the strong points of OCaml: it already has a fast ARM backend, and Mirage can easily be ported to the new Xen/ARM target as hardware becomes available.

One of the most difficult aspects of programming on the “wide area” Internet are dealing with the lack of a distributed identity service that’s fully secure. We published our thoughts on this at the USENIX Free and Open Communications on the Internet workhsop, and David Sheets is working towards a full implementation using Mirage. If you’re interested in following this effort, Amir Chaudhry is blogging at the Nymote project website, where we’ll talk about the components as they are released.

 Data Center Networking

Its A Dog's Day Out There...

At the other extreme from embedded programming is datacenter networking, and we started the Network-as-a-Service research project with Imperial College and Nottingham. With the rapid rise of Software Defined Networking this year, we are investigating how application-specific customisation of network resources can build fast, better, cheaper infrasructure. OCaml is in a good position here: several other groups have built OpenFlow controllers in OCaml (most notably, the Frenetic Project), and Mirage is specifically designed to assemble such bespoke infrastructure.

Another aspect we’ve been considering is how to solve the problem of optimal connectivity across nodes. TCP is increasingly considered harmful in high-through, high-density clusters, and George Parisis led the design of Trevi, which is a fountain-coding based alternative for storage networking. Meanwhile, Thomas Gazagnaire (who joined OCaml Labs in November), has been working on a branch-consistent data store called Irminsule which supports scalable data sharing and reconciliation using Mirage. Both of these systems will see implementations based on the research done this year.

 Higher Kinded Programming

Jeremy Yallop and Leo White have been developing an approach that makes it possible to write programs with higher-kinded polymorphism (such as monadic functions that are polymorphic in the monad they use) without using functors. It’s early days yet, but there’s a library available on OPAM that implements the approach, and a draft paper that outlines the design.

Priorities for 2014

This year has been a wild ride to get us up to speed, but we now have a solid sense of what to work on for 2014. We’ve decided on a high-level set of priorities led by the senior members of the group:

These are guidelines to choosing where to spend our time, but not excluding other work or day-to-day bugfixing. Our focus on collaboration with Jane Street, Citrix, Lexifi, OCamlPro and our existing colleagues will continue, as well as warmly welcoming new community members that wish to work with us on any of the projects, either via internships, studentships or good old-fashioned open source hacking.

I appreciate the whole team’s feedback in editing this long post into shape, the amazing professorial support from Jon Crowcroft, Ian Leslie and Alan Mycroft throughout the year, and of course the funding and support from Jane Street, Citrix, RCUK, EPSRC, DARPA and the EU FP7 that made all this possible. Roll on 2014, and please do get in touch with me with any queries!



all posts