home · projects · papers · blog · gallery · contact
anil madhavapeddy // anil.recoil.org

Final Real World OCaml beta; the good, the bad and the ugly

06 August 2013   |   Anil Madhavapeddy   |   tags: ocaml   |   all posts

The second and final public beta of Real World OCaml is now available: https://realworldocaml.org

Release notes:

There’s been quite a bit of feedback and conversation about the book, so this also seemed like a good point to checkpoint the process somewhat.

Crowd sourcing community feedback

Good: The decision to crowdsource feedback has been exhausting but very worthwhile, with over 2,200 comments posted (and over 2,000 resolved by us too!). O’Reilly has a similar platform called Atlas that wasn’t quite ready when we started our book, but I’d highly encourage new authors to go down this route and not stick with a traditional editorial scheme.

It’s simply not possible for a small group of technical reviewers to notice as many errors as the wider community has. Having said this, it’s interesting how much more focussed and critical the comments of our editor Andy Oram were when compared to most of the wider community feedback, so the commenting system is definitely a complement and not a replacement to the editorial process.

The GitHub requirement

Bad: After the first beta, we got criticized on a Hacker News thread for passing around Github oAuth tokens without SSL. This was entirely my fault, and I corrected the site to be pure-SSL within 24 hours.

Ugly: In my defence though, I dont want the authority that all the reviewers have granted to me for their Github accounts! We need just two things to enable commenting: an identity service to cut down on spam comments, and the ability to create issues in a public repository. Unfortunately, Github’s scope API requires you to also grant us access to commit to public code repositories. Add on the fact that around 6,000 people have clicked through the oAuth API to review RWO, and you start to see just how much code we potentially have access to. I did try to reduce the damage by not actually storing the oAuth tokens on the server-side. Instead, we store it in the client using a secure cookie, so you can easily reset your browser to log out.

It’s not just about authentication either: another reader points out that if they use GitHub during work hours, they have no real way of separating the news streams that result.

Much of the frustration here is that there’s nothing I can do to fix this except wait for GitHub to hopefully improve their service. I very much hope that GitHub is listening to this and has internal plans to overhaul their privilege management APIs.

Infrastructure-free hosting

Good and Bad: One of my goals with the commenting infrastructure was to try and eliminate all server-side code, so that we could simply publish the book onto Github Pages and use JavaScript for the comment creation and listing.

This almost worked out. We still need a tiny HTTP proxy for comment creation, as we add contextual information such as a milestone to every new comment to make it easier to index. Setting a milestone requires privileged access to the repository and so our server-side proxy creates the issue using the user-supplied oAuth token (so that it originates from the commenter), and then updates it (via the bactrian account) to add the milestone add insert a little contextual comment pointing back to the book paragraph where the comment originated from.

Good: The other criticism from the online feedback was the requirement to have a Github login to read the book at all. This is a restriction that we intend to lift for the final release (which will be freely available online under a CC-BY-NC-ND license), but I think it’s absolutely the right decision to gateway early adopters to get useful feedback. Even if we lost 90% of our potential reviewers through the Github auth wall, I don’t think we could have coped with another 10,000 comments in any case.

On the positive side, we didn’t have a single spam comment or other abuses of the commenting system at all.

I’ve had quite a few queries been open-sourcing the scripts that drive the server-side commenting, and this on my TODO list for after the final book has gone to production.

Auto-generating the examples

Bad: We tried for far too long during the book writing to stumble through with manual installation instructions and hand-copied code snippets and outputs. Some of our alpha reviewers pointed out vociferously that spending time on installation and dealing with code typos was not a good use of their time.

Good: Michael Bolin was entirely correct in his criticism (and incidentally, one of our most superstar reviewers). The latest beta has an entirely mechanically generated toolchain that lets us regenerate the entire book output from a cold start by cloning the examples repository. In retrospect, I should have written this infrastructure a year ago, and I’d recommend any new books of this sort focus hard on automation from the early days.

Luckily, my automation scripts could crib heavily from existing open-source OCaml projects that had portions of what we needed, such as uTop and ocaml.org (and my thanks to Jeremie Dimino and Christophe Troestler for their help here).

Awesome: We’re hacking on a little surprise for the final online version of the book, based on this build infrastructure. Stay tuned!

blog comments powered by Disqus