home · projects · papers · blog · gallery · contact
anil madhavapeddy // anil.recoil.org

Announcing the OCaml Labs project

19 October 2012   |   Anil Madhavapeddy   |   tags: ocaml,cambridge   |   all posts

I’m very excited to announce OCaml Labs, the latest project to hit the Cambridge Computer Lab. As anyone that hangs out near me probably realises, I very much enjoy functional programming. My weapon of choice tends to be OCaml, as it condenses decades of research into a pragmatic blend of functional, imperative and object-oriented programming styles. What’s perhaps less well known are the steady inroads that OCaml has been making into mission-critical areas of industry. At Jane Street, billions of dollars of transactions are routed through a huge ML code-base that is designed to catch bugs at compile-time. At Citrix, the Xen management toolstack that powers millions of hosts in the cloud is largely written in OCaml. Facebook does sophisticated static analysis using OCaml over their vast PHP codebase to close security holes.

The OCaml community is small but dedicated, but there is always more to do to improve the language and ecosystem. So, thanks to a generous platform grant from Jane Street, we are launching a program to help with the open-source development of OCaml from Cambridge.

The OCaml Labs are based in the Cambridge Computer Lab and led my myself, Alan Mycroft and Ian Leslie. We’re closely affiliated with other groups, and will be:

Research efforts

Of course, it is difficult to hack on a language in a void, and we also use OCaml heavily in our own research. The other half of OCaml Lab’s goals are more disruptive (and riskier!):

Getting involved

So, how can you get involved? We are initially advertising three positions for full-time developers and researchers (junior and senior) to help us get started with the OCaml Platform and compiler development. These aren’t conventional pure research jobs, and a successful candidate should enjoy the open-source development cycle (you retain your own copyright for your own projects). The Computer Lab offers a pretty unique environment: a friendly, non-hierarchical group in a beautiful city, and some of the best faculty and students you could hope to hang out with.

And finally, there is a longer lead time on applying for PhDs, but this is a great time to get involved. When I started at the Lab in 2002, a little project called Xen was just kicking off, and many of us had a wild (and oft great) time riding that wave. Get in touch with myself, Alan, Ian or Jon soon if you are interested in applying! There’s some more information available on the OCaml Labs pages about options.


Why you should go to ICFP 2012

31 July 2012   |   Anil Madhavapeddy   |   tags: icfp,cufp,oud,fp,ocaml,haskell,erlang   |   all posts

Functional programming has been gaining popularity pretty rapidly recently. We’ve got serious projects from big Internet shops (Microsoft, Twitter and Facebook), to varied domains such as radio, knowledge bases, the energy grid, cryptography, NoSQL databases and storage, intricate PS3 games, and even making safer cars. Throughout all of this, one conference has been around since the very beginning: the ACM SIGPLAN International Conference on Functional Programming. The early days consisted of academics (some with fine beards and sandals) developing the tools and theories behind FP, and spawned the famous ICFP Programming Contest. The modern ICFP, however, also caters to the practical industry practitioner at all levels of knowledge, and this blog post is meant to introduce you to what to expect this year.

ICFP 2012 is a week-long conference on September 9-15th, held in beautiful Copenhagen. The theory-oriented workshops are held just before the main event on Sunday, on topics such as cross-model, generic and higher-order programming. The main conference lasts three days, with a combination of academic papers and experience reports that are formally published by the ACM (see my experience report or a Xenstored full paper for two Xen-related publications).

The main conference finishes mid-week, and the focus switches to more informal, interactive workshops that are relevant to the FP practitioner. The registration system lets you buy a day pass and attend any combination, so here’s a list that I’ve come up, along with “who” it is intended for. Needless to say, these are my personal views (although I’m co-chairing CUFP and the OCaml Workshop, and any mistakes there would be bad!).

Commercial Users of Functional Programming (CUFP)

CUFP has been going since 2004, acting as a voice for commercial users of functional programming languages and technology. CUFP is the biggest workshop at ICFP and spread over the last three days of the conference week.

I’m co-chairing CUFP this year along with Michael Sperber, so feel free to direct any questions you have about it to us, and see below for registration information.

Language Workshops

Some of the bigger communities have day-long workshops which are a combination of short presentations and regular breaks for discussions. If you are using (or just considering the use of) these languages, then these are a good way to meet all the right people who could help you with your efforts.

OCaml and ML

I’m co-chairing the first OCaml Users and Developers workshop on Friday (Sep 14th). This event has traditionally been run in Paris in past years, but we decided it was getting big enought to run alongside ICFP. The talk schedule is a mix of experience reports and pratical new toolchain developments. There will be talks on all the latest development tools (memory profiling, concurrent programming), standard library efforts (Core, OPAM packaging), as well as experience reports (the Arakoon k/v store, Xen Cloud, and building 3D WebGL engines for NASA, among others). The hot topic at this workshop is the emerging consensus on building a more integrated OCaml Platform that acts as a stable base for larger applications, and the state-of-the-world talks from Xavier Leroy, Yaron Minsky and Fabrice le Fessant will no doubt kick off heated discussions that will go on late into dinner (last year, we all ended up in a very fine Tokyo restaurant drinking sake and arguing about GADTs and the global financial crisis).

OCaml is the most widely used variant of the ML family, and the ML workshop on the day before (Sep 13) takes a broader view on the theory and implementations of ML. Talks here include the Coq theorem prover, efficient implementations of OCaml running on the JVM, multicore-ML, and experimental extensions to ML-style programming (effect tracking, model checking, applicative functors, and GADTs).

Both of these workshops are convenient to attend over a two-day period (Thu and Fri), and will give you a huge amount of information about the latest developments in the land of ML.

Haskell

The Haskell community has been growing rather fast over the last few years, and so there are two separate days of talks.

The Haskell Symposium has formal published proceedings, and consists of a diverse array of research work that has direct implications on the future of Haskell. My personal highlights include the parallel array fusion and vectorisation for data processing on modern hardware, the intriguing “Wormhole” effect-tracking FRP, and Safe Haskell for practical information-flow tracking.

The second day has the Haskell Implementors Workshop, which is a more informal affair aimed at the day-to-day infrastructure and toolchain needs of Haskell. The schedule includes overview talks from the two Simons, and ongoing work on distributed programming and Javascript compilation.

So if you are after a broad overview of Haskell then attend the Symposium, and if you wish to join the Haskell Hogwarts and start compiler hacking, then the implementors workshop will have all the right people to help you out.

Erlang

The Erlang workshop at ICFP has been running for eleven years now, and tends to be a combination of big projects and work-in-progress reports. This year has the latest on porting Erlang to use LLVM for high-performance computing, as well as experience reports on scalability testing and the widely-used Riak distributed database. Despite my prelediction towards static typing, Erlang is a language that I keep intending to make bigger use of, so I plan to drop into this session for sure.


Last year we had to learn Japanese, the hardest functional language

We also made a lot of new local friends

Luckily, the weather made us feel at home though
 

Registration

This has been a whirlwind tour of the ICFP week (or rather, me writing this down while planning my own schedule). The early registration deadline is next week (Aug 9th), and the online registration has all the details of the full schedule and day prices. There is also local information about where to stay and visit in Copenhagen, if you want to take a late-summer break while making the trip.

Something worth noting is that all of this is run under the auspices of the ACM by volunteers, which accounts for the somewhat chaotic spread of information across all the websites. We’ve tried to demystify the CUFP process here, but please get in touch with any of the organisers if you are confused and want some help. We always welcome volunteer offers too, and the first round of drinks is on me for anyone who helps out during the week!


Dreaming of an ARM OCaml

25 February 2012   |   Anil Madhavapeddy   |   tags: plugcomputer,ocaml,dreamplug   |   all posts
dreamplug

I’ve been meaning to play with Plug Computers for some time now, as I need a low-power embedded system around the house. I recently bought a Soekris Net6501 (a pretty powerful Intel CPU, that even has VT support), but had annoying issues getting it working reliably. I ordered an ARM-based Dreamplug as an alternative (and as a bonus, the Dreamplug is 6x cheaper than the Soekris!). Here are my notes on getting it to work.

Requirements:

The Dreamplug arrived with a working installation, but running the absolutely ancient Debian Lenny. A dist-upgrade through to Wheezy led to bricking it almost immediately, and so I did a fresh installation from scratch.

For a fresh installation, place a USB stick of suitable size (greater than 2GB is best) into your functional Debian installation. Then:

	$ sudo mount /dev/sdb1 /mnt
	$ sudo cp uImage /mnt
	$ sudo umount /mnt
	
	$ sudo apt-get install qemu-user-static debootstrap
	$ sudo mount /dev/sdb2 /mnt
	$ sudo mkdir -p /mnt/usr/bin
	$ sudo cp /usr/bin/qemu-arm-static /mnt/usr/bin/
	$ sudo qemu-debootstrap --arch=armel wheezy http://ftp.uk.debian.org/debian/
	
	$ cd /mnt
	$ sudo tar -zxvf ~/sheeva-3.2.7-Modules.tar.gz
	$ sudo chroot /mnt
	$ depmod -a
	# edit /etc/network/interfaces
	# edit /etc/resolv.conf
	
	blacklist libertas
	blacklist libertas_sdio
	

OCaml on ARM

One of the reasons I wanted an ARM-based setup is to experiment with the OCaml native code generation. Benedikt Meurer has been doing some excellent work on improving code generation for embedded systems, including support for 16-bit Thumb code, exception backtraces, and dynamic linking and profiling.

Once Linux was up and running, compiling up the latest ocaml-trunk was straightforward.

	$ sudo apt-get install build-essential git
	$ git clone http://github.com/OCamlPro/ocp-ocaml svn-trunk
	$ cd ocp-ocaml
	$ ./configure && make world opt opt.opt install
	

This compiles the bytecode and native code compilers, and then compiles them again using the native code generator. This takes a while to do on the poor little ARM CPU. Once that finished, I compiled up a few simple modules and they worked great! Since the trunk of OCaml is a development branch, you may run into a few packaging issues (use the very latest OASIS to regenerate any setup.ml, and you will need a small patch until PR 5503 is applied).

Incidentally, if anyone is interested in working on a Mirage port to ARM as an internship in the Cambridge Computer Lab, do get in touch with me…


Commercial Users of Functional Programming 2011 Schedule

09 August 2011   |   Anil Madhavapeddy   |   tags: conferences,ocaml   |   all posts

The CUFP 2011 schedule is now available and the conference registration is open! The cufp.org website is having a few DNS propagation issues at the moment and is unreachable, so I have replicated the schedule below for your perusal. Hope to see lots of new faces in Tokyo!

September 22nd, 2011


TimeSpeakers
09:00 AM - 12:30 PM T1: Building Reliable Client-Server Applications in Erlang
Erlang Solutions Ltd.
09:00 AM - 12:30 PM T2: JaneStreet's OCaml Core Library
Jane Street
02:00 PM - 05:30 PM T3: Building a Functional OS
University of Cambridge
Citrix
university of nottingham
02:00 PM - 05:30 PM T4: Collaborative Scientific Software
New York University

September 23rd, 2011


September 24th, 2011


TimeSpeakers
09:00 AM - 10:00 AM Keynote: Pragmatic Haskell
Standard Chartered Bank
10:00 AM - 10:30 AM Tea Break
10:30 AM - 11:00 AM Theorem-based derivation of an AES Implementation
11:00 AM - 11:30 AM Discrete Event Simulation using Erlang
EDF R&D, energy utility
11:30 AM - 12:00 PM Model based testing of AUTOSAR automotive components
12:00 PM - 01:30 PM Lunch Break
01:30 PM - 02:00 PM HTML5 web application development in OCaml
IT Planning, Inc.
02:00 PM - 02:30 PM Large-scale Internet Services in Scala at Twitter
Twitter
02:30 PM - 03:00 PM Applying Functional Programming to Build Platform-Independent Mobile Applications
IntelliFactory
03:00 PM - 03:30 PM Tea Break
03:30 PM - 04:00 PM Fourteen Days of Haskell: A Real Time Programming Project in Real Time
Alcatel-Lucent
04:00 PM - 04:30 PM Disco: using Erlang to implement Mapreduce, Nokia
Erlang, Nokia
Erlang, Nokia
Nokia Research
04:30 PM - 05:00 PM Functional mzScheme DSLs in Game Development
Naughty Dog Inc.
05:00 PM - 05:30 PM OCaml and Acunu Experience Report
Acunu
Acunu

DataCaml - a first look at distributed dataflow programming in OCaml

18 June 2011   |   Anil Madhavapeddy   |   tags: ocaml,distributed   |   all posts

Distributed programming frameworks like Hadoop and Dryad are popular for performing computation over large amounts of data. The reason is programmer convenience: they accept a query expressed in a simple form such as MapReduce, and automatically take care of distributing computation to multiple hosts, ensuring the data is available at all nodes that need it, and dealing with host failures and stragglers.

A major limitation of Hadoop and Dryad is that they are not well-suited to expressing iterative algorithms or dynamic programming problems. These are very commonly found patterns in many algorithms, such as k-means clustering, binomial options pricing or Smith Waterman for sequence alignment.

Over in the SRG in Cambridge, we developed a Turing-powerful distributed execution engine called CIEL that addresses this. The NSDI 2011 paper describes the system in detail, but here’s a shorter introduction.

The CIEL Execution Engine

CIEL consists of a master coordination server and workers installed on every host. The engine is job-oriented: a job consists of a graph of tasks which results in a deterministic output. CIEL tasks can run in any language and are started by the worker processes as needed. Data flows around the cluster in the form of references that are fed to tasks as dependencies. Tasks can publish their outputs either as concrete references if they can finish the work immediately or as a future reference. Additionally, tasks can dynamically spawn more tasks and delegate references to them, which makes the system Turing-powerful and suitable for iterative and dynamic programming problems where the task graph cannot be computed statically.

The first iteration of CIEL used a domain-specific language called Skywriting to coordinate how tasks should run across a cluster. Skywriting is an interpreted language that is “native” to CIEL, and when it needs to block it stores its entire execution state inside CIEL as a continuation. Derek Murray has written a blog post explaining this in more detail.

More recently, we have been working on eliminating the need for Skywriting entirely, by adding direct support for CIEL into languages such as Python, Java, Scala, and the main subject of this post – OCaml. It works via libraries that communicate with CIEL to spawn tasks, publish references, or suspend itself into the cluster to be woken up when a future reference is completed.

DataCaml API

Rather than go into too much detail about the innards of CIEL, this post describes the OCaml API and gives some examples of how to use it. The simplest interface to start with is:

	  type 'a ref
	  val deref : 'a ref -> 'a
	

The type 'a ref represents a CIEL reference. This data might not be immediately present on the current node, and so must be dereferenced using the deref function.

If the reference has been completed, then the OCaml value is unmarshalled and returned. If it is not present, then the program needs to wait until the computation involving the reference has completed elsewhere. The future reference might contain a large data structure and be on another host entirely, and so we should serialise the program state and spawn a task that is dependent on the future’s completion. This way, CIEL can resume execution on whatever node finished that computation, avoiding the need to move data across the network.

Luckily, we do not need to serialise the entire heap to suspend the program. DataCaml uses the delimcc delimited continuations library to walk the stack and save only the subset required to restart this particular task. Delimcc abstracts this in the form a “restartable exception” that supplies a closure which can be called later to resume the execution, as if the exception had never happened. Delimcc supports serialising this closure to an output channel, which you can read about in Oleg’s paper.

So how do we construct references? Lets fill in more of the interface:

	module Ciel = struct
	  type 'a ref
	  val deref : 'a ref -> 'a
	  val spawn : ('a -> 'b) -> 'a -> 'b ref
	  val run : (string list -> 'a) -> ('a -> string) -> unit
        end
	

The spawn function accepts a closure and an argument, and returns a future of the result as a reference. The run function begins the execution of a job, with the first parameter taking some string arguments and returning an 'a value. We also supply a pretty-printer second argument to convert the 'a into a string for returning as the result of the job (this can actually be any JSON value in CIEL, and just simplified here).

	let r1 = spawn (fun x -> x + 5) arg1 in
  	let r2 = spawn (fun x -> deref r1 + 5) arg1 in
	deref r2 
	

We first spawn a function r1 which simply adds 5 to the job argument. A job in CIEL is lazily scheduled, so this marshals the function to CIEL, creates a future, and returns immediately. Next, the r2 function spawns a task which also adds 5, but to the dereferenced value of r1. Again, it is not scheduled yet as the return reference has not been dereferenced.

Finally, we attempt to dereference r2, which causes it be scheduled on a worker. While executing, it will try to dereference r1 that will schedule it, and all the tasks will run to completion.

Programming language boffins will recognise that this interface is very similar to AliceML’s concept of lazy futures. The main difference is that it is implemented as a pure OCaml library, and uses a general-purpose distributed engine that can also work with other languages.

Streaming References

The references described so far only have two states: they are either concrete or futures. However, there are times when a task can progressively accept input and make forward progress. For these situations, references can also be typed as opaque references that are accessed via in_channel and out_channel, as networks are:

	type opaque_ref

	val spawn_ref : (unit -> opaque_ref) -> opaque_ref
	val output : ?stream:bool -> ?pipe:bool -> (out_channel -> unit) -> opaque_ref
	val input : (in_channel -> 'a) -> opaque_ref -> 'a
	

This interface is a lower-level version of the previous one:

The CIEL engine actually supports multiple concurrent input and output streams to a task, but I’ve just bound it as a single version for now while the bindings find their feet. Here’s an example of how streaming references can be used:

	let x_ref = spawn_ref (fun () ->
	  output ~stream:true (fun oc ->
	    for i = 0 to 5 do
	      Unix.sleep 1;
	      fprintf oc "%d\n%!" i;
	    done
	  )
	) in
	let y_ref = spawn_ref (fun () ->
	  input (fun ic ->
	    output ~stream:true (fun oc ->
	      for i = 0 to 5 do
	        let line = input_line ic in
	        fprintf oc "LINE=%s\n%!" line
	      done
	    )
	  ) x_ref
	) in
	

We first spawn an x_ref which pretends to do 5 seconds of work by sleeping and outputing a number. This would of course be heavy number crunching in a real program. The y_ref then inputs this stream, and outputs its own result by prepending a string to each line.

Try it out

If you are interested in a more real example, then read through the binomial options calculator that uses streaming references to parallelise a dynamic programming problem (this would be difficult to express in MapReduce). On my Mac, I can run this by:

Discussion

The DataCaml bindings outlined here provide an easy way to write distributed, fault-tolerant and cluster-scheduled jobs in OCaml. The current implementation of the engine is aimed at cluster computation, but Malte has been working on condensing CIEL onto multicore hardware. Thus, this could be one approach to ‘solving the OCaml multicore problem’ for problems that fit nicely into the dataflow paradigm.

The biggest limitation for using these bindings is that delimited continuation serialisation only works in bytecode. Native code delimcc supports shift/reduce in the same program, but serialising is problematic since native code continuations contain a C stack, which may have unwrapped integers. One way to work around this is by switching to a monadic approach to dereferencing, but I find delimcc programming more natural (also see this discussion).

Another important point is that tasks are lazy and purely functional (remind you of Haskell?). This is essential for reliable fault-tolerance and reproducibility, while allowing individual tasks to run fast, strict and mutable OCaml code. The tasks must remain referentially transparent and idempotent, as CIEL may choose to schedule them multiple times (in the case of faults or straggler correction). Derek has been working on integrating non-determinism into CIEL, so this restriction may be relaxed soon.

Finally, these ideas are not limited to OCaml at all, but also apply to Scala, Java, and Python. We have submitted a draft paper dubbed A Polyglot Approach to Cloud Programming with more details and the ubiquitous evaluation versus Hadoop. There is a really interesting line to explore between low-level MPI coding and high-level MapReduce, and we think CIEL is a useful spot in that design space.

Incidentally, I was recently hosted by Nokia Research in Palo Alto by my friend Prashanth Mundkur, where they work on the Python/Erlang/OCaml Disco MapReduce engine. I’m looking forward to seeing more critical comparisons and discussions of alternatives to Hadoop, from them and others.

Thanks are due to Derek, Chris and Malte for answering my incessant CIEL questions while writing this post! Remember that DataCaml is a work in progress and a research prototype, and feedback is most welcome.


Recoil gets IPv6 at last

08 June 2011   |   Anil Madhavapeddy   |   tags: recoil,ipv6   |   all posts

Since it is World IPv6 Day today, I decided to update the recoil.org machines to support it.

The changes required turned out to be very straightforward. Our amazing hosting provider Bytemark has been IPv6-ready for some time, and so I just had to reconfigure OpenBSD to add the IPv6 equivalent address as an alias to the network device. This is done by adding a single line to the /etc/hostname.re0 file:

inet6 alias 2001:41c8:0010:02ad::1 64

…and an additional default route in the /etc/mygate file:

fe80::1%re0

The %re0 at the end indicates which network interface to route across, as fe80::1 is a link-local address.

Then, I had to add an IPv6 AAAA record in the DNS. We use a hosting provider called EasyDNS who have support for this in their new user interface. One thing to be aware of is that you must upgrade your domain to their new system if you haven’t touched the DNS setup for a whole, which took me a while to find out (but they responded very quickly on Twitter!).

Once that was done, I can now check the host record in DNS, and connect to Google over IPv6!

$ host -t aaaa www.recoil.org
www.recoil.org is an alias for static.recoil.org.
static.recoil.org is an alias for dark.recoil.org.
dark.recoil.org has IPv6 address 2001:41c8:10:2ad::1

$ ping6 ipv6.google.com
PING6(56=40+8+8 bytes) 2001:41c8:10:2ad::1 --> 2a00:1450:8002::69
16 bytes from 2a00:1450:8002::69, icmp_seq=0 hlim=56 time=13.313 ms
16 bytes from 2a00:1450:8002::69, icmp_seq=1 hlim=56 time=13.165 ms
16 bytes from 2a00:1450:8002::69, icmp_seq=2 hlim=56 time=13.332 ms

Awesome! One final change is to modify system daemons to also listen on IPv6 in addition to IPv4. I’ve only changed our nginx web server so far, by following these IPv6 instructions. The OpenBSD ports version of nginx already has IPv6 support compiled in, so I just added a single line to the configuration:

listen [::]:80;

Since I don’t have access to any other IPv6 machines, I tested that the website is available by going to the IPv6 website reachability tester, and typing in www.recoil.org. And it all worked, and I got this cool little badge to display here…

ipv6 ready

all posts