The fine folks at O’Reilly have been proof-reading the Real World OCaml book that I’ve been working on. My heart leapt with joy when the copyeditor commented that she thought it was very well written, but my joy was short-lived when it turns out that all her comments were safely ensconced as PDF comments. A few hundreds comments that required a response from us, too.
“No problem! MacOS X Preview can handle these!” was of course my first response, but it turns out it’s totally broken with PDF notes. The first note that you select will appear as the content of all the other subsequent notes. Yaron Minsky then experimented with Adobe Acrobat, which I’ve sworn never to install again after an unfortunate incident involving the uninstaller a couple of years ago. That turned out to be incredibly slow. I tried a few open-source tools such as Skim which, while otherwise an excellent bit of software, couldn’t render these particular annotations.
|I literally jumped off my seat upon discovering cpdf.|
Meanwhile, John Whitington just announced the release of the Coherent PDF command-line tools. Since these are all written in OCaml (and have been developed over quite a few years now), he also sent in an OPAM pull request to add it to the database. And most conveniently, this ended up solving my little PDF conundrum in less than an hour of hacking, and has almost cured me of my lifelong fear of dealing with anything in a PDF-like format. Here’s what I did:
I installed the tools via
opam install cpdf. This installed the library but not the binary (swiftly fixed).
Reading the license told me that it’s for non-commercial use only, so I bought a license from the Coherent PDF website (a bargain price, given how much it does!).
cpdf -list-annotationsover the PDF, and it dumped out all the comments as a text file to stdout. This wasn’t quite enough for me, since I needed to match the annotation to a page number. But since John has released it as open-source, I forked the repository and patched the support directly into the command-line tools, and sent a pull request back over to John. Since it’s under a non-standard license, I decided to place my patch in the public domain to make it easier for him to accept it if he chooses.
My co-authors can just run
opam pin cpdf git://github.com/avsm/cpdf-source#annotation-page-numbersto pin their local copy of CPDF to my forked branch in their own OPAM installations, and easily use my copy until John gets a chance to integrate my changes properly upstream.
Total time including this blog post: 40 minutes. Now, onto fixing the author responses comments for Real World OCaml now. I’m so happy to have
cpdf as a simple, hackable PDF utility, as it does things like page combining and rotations that have always been a little flaky in other tools for me. It’s the Pandoc of PDFs!