This note was published on 8th Apr 2013.

A regular question that comes up from OCaml developers is how to use OPAM as a hypothesis testing tool against the known corpus of OCaml source code. In other words: can we quickly and simply run grep over every source archive in OPAM? So that’s the topic of today’s 5 minute blog post:

git clone git://github.com/ocaml/opam-repository
cd opam-repository
opam-admin make
cd archives
for i in *.tar.gz; \
  do tar -zxOf $i | grep caml_stat_alloc_string; \
done

In this particular example we’re looking for instances of caml_stat_alloc_string, so just replace that with the regular expression of your choice. The opam-admin tool repacks upstream archives into a straightforward tarball, so you don’t need to worry about all the different archival formats that OPAM supports (such as git or Darcs). It just adds an archive directory to a normal opam-repository checkout, so you can reuse an existing checkout if you have one already.

$ cd opam-repository/archives
$ du -h
669M    .
$ ls | wc -l
2092