Camel Spotting in Paris

Camel Spotting in Paris / Apr 2011

I'm at the 2011 OCaml Users Group in Paris, reporting on some splendid talks this year. It looked like around 60-70 people in the room, and I had the pleasure of meeting users all the way from Russia to New York as well as all the Europeans!

Js_of_ocaml

First up was Pierre Chambart talking about the js_of_ocaml compiler. It compiles OCaml bytecode directly to Javascript, with few external dependencies. Since the bytecode format changes very rarely, it is simpler to maintain than alternatives (such as Jake Donham’s ocamljs) that require patching the compiler tool-chain. Javascript objects are mapped to dynamic OCaml objects via a light-weight ## operator, so you can simply write code like:

  class type window = object
      method alert : js_string t -> unit meth
      method name : js_string t prop
    end
    let window : window t =
      JS.Unsafe.variable "window"
    
    let () = 
      window##alert ( window##name)
      name <- Js.string "name"

Overloading is handled similarly to PyObjC, with each parameter combination being mapped into a uniquely named function. Raphael Proust then demonstrated a cool game he wrote using via bindings to the Raphael Javascript vector graphics library. Performance of js_of_ocaml is good compared to writing it by hand, and they have have quite a few benchmarks on their website.

Overall the project looks very usable: the main omissions are Bigarray, no dynlink, no Str (replaced by native regexps), no recursive modules or weak references. None of these missing features seem very critical for the sorts of applications that js_of_ocaml is intended for.

OCaml on a PIC (OCAPIC)

Next up Phillipe Wang presented something completely different: running OCaml on tiny 8-bit PIC microcontrollers! These PICs have 4-128Kb of flash (to store the code), and from 256 bytes to 4 kilobytes. Not a lot of room to waste there. He demonstrated an example with a game with 24 physical push buttons that beat humans at a conference (JFLA).

It works by translating OCaml bytecode through several stages: ocamlclean to eliminate dead code in the bytecode (which would be very useful for native code too!), a compression step that does run-length encoding, and then translation to PIC assembly. They have a replacement stop-and-copy GC (150 lines of assembly) and a full collection cycle runs in less than 1.5ms. Integers are 15-bits (with 1 bit reserved) and the block representation is the same as native OCaml. Very cool project!

Frama-C

We went onto static analysis and Julien Signoles presented Frama-C, a powerful static analysis tool for real-world C. It forks the CIL project from Berkeley and adds ocamlgraph and GUI support. He demonstrated a simple loop counter plugin to count them in C code, and the homepage has many interesting plugins maintained by the community.

I hadn’t realised that CIL was still maintained in the face of clang, so it’s nice to see it live on as part of Frama-C.

Ocsigen

The ever-cheerful Vincent Balat updated us about the Ocsigen web framework, including unveiling their exciting new logo! This was written using an amazing collaborative editor that lets users edit in real time.

Ocsigen is based around services of type service: parameters -> page. Services are first-class values, and can be registered dynamically and associated with sessions. The code for the collaborative editor was about 100 lines of code.

There is a syntax extension to distinguish between client and server side code, and both can be written in the same service (invoking js_of_ocaml to compile the client code to Javascript). They have bindings to Google Closure in order to provide UI support. There is a really nice “bus” service to pass messages between the server and the client, with seamless integration of Lwt to hide the details of communication to the browser.

Ocsigen is looking like a very mature project at this point, and I’m very keen to integrate it with Mirage to specialise the into micro-kernels. A task for the hacking day tomorrow morning I think!

Mirage

I talked about Mirage, hurrah! Good questions about why we need a block device (and not just use NFS), and I replied that everything is available as the library and the programmer can choose depending on their needs (the core goal of exokernels).

A highlight for me was lunch where I finally met Richard Jones, who is one of the other OCaml and cloud hackers out there. Wide ranging conversation about what the cool stuff going in KVM and Red Hat in general. Richard also gave a short talk about how they use OCaml to generate hundreds of thousands of lines of code in libguestfs. There are bindings for pretty much every major language, and it is all generated from an executable specification. He notes that “normal” programmers love the OCaml type safety without explicit annotations, and that it is a really practical language for the working programmer. The Xen Cloud Platform also has a similar generator for XenAPI bindings, so I definitely agree with him about this!

OCaml Future

Xavier “superstar” Leroy then gave an update of OCaml development. Major new features in 3.12.0 are first-class modules, polymorphic recursion, local module opens, and richer operations over module signatures. Version 3.12.1 is coming out soon, with bug fixes (in camlp4 and ocamlbuild mainly), and better performance on x86_64: turns out a new mov instruction change improves floating point performance on x86_64.

OCaml 3.13 has no release date, but several exciting features are in the pipeline. Firstly, more lightweight first-class modules by permitting some annotations to be inferred by the context, and it introduces patterns to match and bind first-class module values. Much more exciting is support for GADTs (Generalised Algebraic Data Types). This permits more type constraints to be enforced at compile time:

  type _ t =
      | IntLit : int -> int t
      | Pair : 'a t * 'b t -> ('a * 'b) t
      | App : ('a -> 'b) t * 'a t -> 'b t
      | Abs : ('a -> 'b) -> ('a -> 'b) t
     
    let rec eval : type s . s t -> s = function
      | IntLit x -> x (* s = int here *)
      | Pair (x,y) -> (eval x, eval y) (* s = 'a * 'b here *)
      | App (f,a) -> (eval f) (eval a)
      | Abs f -> f

In this example of a typed interpreter, the eval function is annotated with a type s . s t -> s type that lets each branch of the pattern match have a constrained type for s depending on the use. This reminded me of Edwin Brady’s partial evaluation work using dependent types, but a much more restricted version suitable for OCaml.

There are some really interesting uses for GADTs:

Enforcing invariants in data structures, as with the typed interpreter example above.
Reflecting types into values means that libraries such as our own dyntype can be expressed in the core language without lots of camlp4 hacks. Finally, this should make typed I/O generators for XML, JSON and other network formats much simpler.

The challenges in the implementation are that principle type inference is now impossible (so some annotation is required), and pattern matching warnings are also trickier.

From the IDE perspective, the third bit of work is to have the OCaml compiler save the full abstract syntax tree annotation with source locations, scoping information, types (declared and inferred) and addition user-defined annotations. This generalises the -annot flag and can help projects like OCamlSpotter, OCamlWizard, OcaIDE, etc. It also helps code-generators driven by type-generators (such as our SQL ORM or ATDgen).

The OCaml consortium has new members; MLState and MyLife, and Esterel, OCamlPro and one unnamed new member are joining. The consortium goals are to sell permissive licensing (BSD) to members, and sound off new features with the serious users. Three companies are now doing commercial development (Gerd, OCamlCore, OCamlPro) which is growing the community nicely.

JoCaml

Luc Maranget (who looks like an archetypal mad professor!) gave a great rundown on JoCaml, a distributed programming extension to OCaml. This extends the compiler with join-definitions (a compiler patch), and a small bit of runtime support (using Thread), and significant extensions for concurrent and distributed programming in a type-safe way.

It extends the syntax with three new keywords: def, spawn and reply, and new usage for or and & (you should be using || and && anyway). Binary libraries remain compatible between matching versions of JoCaml and OCaml. An example of JoCaml code is:

  let create n =
      def st(rem) & tick() = st(rem-1)
      or st(0) & wait() = reply to wait in
      spawn st(n) ; { tick=tick; wait=wait; }
    
    type t = {
      tick: unit Join.chan;
      wait: unit -> unit;
    }

After n messages to tick, the wait barrier function will be called.

  let c = create n
    let () =
      for k = 0 to 9 do
       spawn begin printf "%i" k; c.tick ()
      done;
      c.wait ()

Here we asynchronously print the numbers of 0 to 9, and then the wait call acts as a barrier until it finishes. JoCaml is useful for distributed fork-join parallelism tasks such as raytracing, but with the type system support of OCaml. It is a bit like MapReduce, but without the data partitioning support of Hadoop (and is more light-weight). It would be quite interesting to combine some of the JoCaml extensions with the dynamic dataflow graphs in our own CIEL distributed execution engine.

Forgetful Memoisation in OCaml

Francois Bobot talks about the problem of memoizing values so that they can be re-used (e.g. in a cache). Consider a standard memoiser:

  let memo_f =
      let cache = H.create () in
      fun k ->
        try H.find cache k
        with Not_found ->
          let v = f k in
          H.add cache k v;
          v
    
    let v1 = memo_f k1
    let v2 = memo_f k2 in (* k2 = k1 in O(1) *)

If a key is not reachable from anywhere other than the heap, we want to eliminate it from the cache also. The first solution is a normal hashtable, but this results in an obvious memory leak since a key held in the cache marks it as reachable. A better solution is using OCaml weak pointers that permit references to values without holding on to them (see Weaktbl by Zheng Li who is now an OCaml hacker at Citrix). The problem with Weaktbl is that if the value points to the key, forming a cycle which will never be reclaimed.

Francois solves this by using Ephemerons from Smalltalk. They use the rule that the value can be reclaimed if the key or the ephemeron itself can be reclaimed by the GC, and have a signature like:

  module Ephemeron : sig type ('a,'b) t
      val create : 'a -> 'b -> ('a,'b) t
      val check : ('a,'b) t -> bool
      val get : ('a,'b) t -> 'b option
      val get_key : ('a,'b) t -> 'a option
    end

The implementation in OCaml patches the runtime to use a new tag for ephemerons, and the performance graphs in his slides look good. This is an interesting topic for me since we need efficient memoisation in Mirage I/O (see the effects on DNS performance in the Eurosys paper which used Weaktbl). When asked if the OCaml patch will be upstreamed, Damien Doligez did not like the worst-case complexity of long chains of ephemerons in the GC, and there are several approaches under consideration to alleviate this without too many changes to the runtime, but Francois believes the current complexity is not too bad in practise.

Oasis and website

Sylvain came on stage later to give a demonstration of OASIS, an equivalent of Cabal for Haskell or CPAN for Perl. It works with a small _oasis file that describes the project, and then the OASIS tool auto-generates ocamlbuild files from it (this reminds me of Perl’s MakeMaker). Once the files are auto-generated, it is self-contained and there is no further dependency on OASIS itself.

Gallery
How many OCaml hackers does it take to change a lightbulb?

Wearing bibs at French Teppinyaki

Team Mirage cheeses it up

OASIS works with either an existing build system in a project, or can be integrated more closely with ocamlbuild by advanced users. Lots of projects are already using OASIS (from Cryptokit to Lwt to the huge Jane Street Core). He is also working on a distribution mechanism on a central website, which should make for convenient OCaml packaging when it is finished and gets more adoption from the community.

Finally, Ashish Agarwal led a discussion on how OCaml can improve its web presence for beginners. Lots of good ideas here (some of which we implemented when reworking the CUFP website last year). Looking forward to seeing what happens next year in this space! I really enjoyed the day; the quality of talks was very high, and many engaging discussions from all involved!

Of course, not all of the OCaml community action is in France. The ever-social Jake Donham organised the First Ever San Francisco User Group that I attended when I was over there a few weeks ago. Ok, admittedly it was mainly French people there too, but it was excellent to meet up with Mika, Martin, Julien, Henri and of course Jake when over there.

We should definitely have more of these fun local meetups, and a number of other OCaml hackers I mentioned it to want to attend next time in the Bay Area, if only to cry into their drinks about the state of multi-core... just kidding, OCamlPro is hard at work fixing that after all :-)

# 15th Apr 2011

notes ocamllabs