Oh my Claude, we need agentic copilot sandboxing right now

Oh my Claude, we need agentic copilot sandboxing right now / Mar 2025

Yaron Minsky nerdsniped me last week into getting OCaml to drive the 80s-retro RGB Matrix displays. I grabbed one from the local Pi Store and soldered it together with help from Michael Dales. But instead of writing OCaml bindings by hand, we thought we'd try out the latest agentic CLI called Claude Code released last week to see if we could entirely autogenerate the bindings.

TL;DR: Claude Coder generated working OCaml code almost from scratch, ranging from C bindings to high-level OCaml interface files and even Cmdliner terms, but needs a more sophisticated sandboxing model before something goes horribly wrong. So much potential and so much danger awaits us. Coincidentally Cyrus Omar and Patrick Ferris and I wrote about this a few months ago. Read on...

Wiring up the display to my Raspberry Pi

The RGB Matrix display has a very nice C++ rpi-rgb-led-matrix library, so I fired up my Raspberry Pi 4 to get an OCaml development environment going by using that. The included demo immediately gave me a disappointingly noisy display, but my larger-than-usual 64x64 display turned out to just need a jumper soldered.

Deploying my local friendly agentic soldering machine otherwise known as Michael Dales

As soon that was soldered, the examples worked great out of the box, so I could get on with some agentic OCaml coding. Thanks Michael Dales and CamMakespace!

Building OCaml bindings using Claude Coder

Yaron Minsky and I first played around with using ocaml-ctypes to build the bindings by hand, but quickly switched over to trying out Claude Sonnet 3.7, first in VSCode and then directly on the Pi CLI via Claude Code. The latter fires up an interactive session where you not only input prompts, but it can also run shell commands including builds.

The very first hurdle was sorting out the build rules. This is the one place where Claude failed badly; it couldn't figure out dune files at all, nor the intricate linking flags required to find and link to the C++ library. I made those changes quickly by hand, leaving just a stub librgbmatrix_stubs.c that linked successfully with the main C++ library, but didn't do much beyond that. I also added a near-empty rgb_matrix.ml and rgb_matrix.mli interface files to have a place for the OCaml side of the interface.

The Claude Code CLI runs fine on the Raspberry Pi 4, since most of the heavy computation is done on their end.

After that, it was just a matter of "asking the Claude Code CLI" via a series of prompts to get it to fill in the code blanks I'd left. The VSCode Copilot editing mode has to be told which files to look at within the project for its context, but I didn't have to do that with the Claude Code CLI.

Instead, I just prompted it to generate C stubs from the led-matrix-c.h C interface (so it didn't get distracted attempting to bind C++ to OCaml, which isn't a winning proposition). It duly generated reasonable low-level bindings, along with the right OCaml interface files by suggesting edits to the files I'd created earlier. At this point, I got a very basic "hello world" circle going (with the test binary also built by Claude).

The OCaml bindings and concentric circles were all auto-generated by Claude Sonnet 3.7

Although the binding generation built fine, they did segfault when I first ran the test binary! Claude 3.7 bound some C/OCaml functions with more than 5 arguments, which are a special case in OCaml due to differing bytecode and native code ABIs. Although Claude almost got it right, it subtly mixed up the order of the external binding on the OCaml side. The correct version is:

external set_pixels_native :
  t -> int -> int -> int -> int -> Color.t array -> unit =
  "caml_led_canvas_set_pixels_bytecode" "caml_led_canvas_set_pixels"

The bytecode C stub comes first, and the native code second, but Claude swapped them which lead to memory corruption. This mixup would ordinarily be rather hard to spot, but the valgrind backtrace lead me to the problem very quickly (but only because I'm very familiar with the OCaml FFI!). I couldn't convince Claude to fix this with prompting as it kept making the same mistake, so I swapped the arguments manually and committed the results by hand.

Generating higher level OCaml interfaces and docstrings

Once the basics were in place, I then asked it to then refine the OCaml interface to be higher-level; for example instead of a string for the hardware mode, could it scan the C header file, find the appropriate #defines, and generate corresponding OCaml variant types? Incredibly, it not only did this, but also generated appropriate OCamldoc annotations for those types from the C header files.

These OCamldoc entries are generated automatically from the C header files

The Claude Code CLI then helpfully summarises all the changes, and also offers execute dune to check the result works! This is starting to get a bit mad...

Claude offers to do the dune build after making code changes

It can also navigate the output of commands to see if the desired outcome is successful

The patches to the interface and implementation added in more abstract types as requested

The OCaml interfaces generated here required a little iteration to get right, with some manual tweaks. Claude, for some reason, generated duplicate entries for some type definitions, which OCaml doesn't permit. I fixed those manually very quickly, and then asked Claude Code to commit the changes to git for me. It generated a good summary commit message. The interfaces were all documented with docs from the C header file, such as:

type multiplexing =
  | DirectMultiplexing (* 0: Direct multiplexing *)
  | Stripe             (* 1: Stripe multiplexing *)
  | Checker            (* 2: Checker multiplexing (typical for 1:8) *)
  | Spiral             (* 3: Spiral multiplexing *)
  | ZStripe            (* 4: Z-Stripe multiplexing *)
  | ZnMirrorZStripe    (* 5: ZnMirrorZStripe multiplexing *)
  | Coreman            (* 6: Coreman multiplexing *)
  | Kaler2Scan         (* 7: Kaler2Scan multiplexing *)
  | ZStripeUneven      (* 8: ZStripeUneven multiplexing *)
  | P10MapperZ         (* 9: P10MapperZ multiplexing *)
  | QiangLiQ8          (* 10: QiangLiQ8 multiplexing *)
  | InversedZStripe    (* 11: InversedZStripe multiplexing *)
  | P10Outdoor1R1G1_1  (* 12: P10Outdoor1R1G1_1 multiplexing *)
  | P10Outdoor1R1G1_2  (* 13: P10Outdoor1R1G1_2 multiplexing *)
                       (* ...etc <snipped> *)
  | Custom of int      (* Custom multiplexing as an integer *)

Pretty good! After that, I couldn't resist pushing it a bit further. I asked the CLI to generate me a good command-line interface using Cmdliner, which is normally a fairly intricate process that involves remembering the Term/Arg DSL. But Claude aced this; it generated a huge series of CLI converter functions like this:

(* scan_mode conversion *)
  let scan_mode_conv =
    let parse s =
      match String.lowercase_ascii s with
      | "progressive" -> Ok Progressive
      | "interlaced" -> Ok Interlaced
      | _ -> Error (`Msg "scan_mode must be 'progressive' or 'interlaced'")
    in
    let print fmt m =
      Format.fprintf fmt "%s"
        (match m with
         | Progressive -> "progressive"
         | Interlaced -> "interlaced")
    in
    Arg.conv (parse, print)

These are not entirely what I'd write, as Cmdliner.Arg.enum would suffice, but they're fine as-is and could be refactored later. I even got it to complete the job and generate a combined options parsing function for the (dozens) of command-line arguments, which would have been very tedious to do by hand:

(* Apply options from command line to Options.t *)
let apply_options options
    ~rows ~cols ~chain_length ~parallel ~hardware_mapping ~brightness 
    ~pwm_bits ~pwm_lsb_nanoseconds ~pwm_dither_bits ~scan_mode ~row_address_type 
    ~multiplexing ~disable_hardware_pulsing ~show_refresh_rate ~inverse_colors
    ~led_rgb_sequence ~pixel_mapper_config ~panel_type ~limit_refresh_rate_hz 
    ~disable_busy_waiting =
  Options.set_rows options rows;
  Options.set_cols options cols;
  Options.set_chain_length options chain_length;
  Options.set_parallel options parallel;
  Options.set_hardware_mapping options hardware_mapping;
  Options.set_brightness options brightness;
  Options.set_pwm_bits options pwm_bits;
  Options.set_pwm_dither_bits options pwm_dither_bits;
  Options.set_scan_mode options scan_mode;
  Options.set_pixel_mapper_config options pixel_mapper_config;
  Options.set_panel_type options panel_type;
  Options.set_limit_refresh_rate_hz options limit_refresh_rate_hz;
  Options.set_disable_busy_waiting options disable_busy_waiting;
  (* ...etc <snipped> *)
  options

Once this compiled, I asked for a rotating 3D cube demo, and it duly used the bindings to give me a full command-line enabled generator which you can see below. I just ran:

rotating_block_generator.exe --disable-hardware-pulsing -c 64 -r 64 --hardware-mapping=adafruit-hat  --gpio-slowdown=2

and I had a spinning cube on my display! The code model had no problem with the matrix transformations required to render the cool spinning effect.

Of course, I had to pay the piper for the truckload of GPUs that drove this code model. At one point, the Claude Code agent got into a loop that I had to manually interrupt as it kept oscillating on a code fix without ever finding the right solution. This turned out to have sucked up quite a lot of money from my Claude API account!

This post cost me a cup of coffee and a boatload of energy

Overall, I'm impressed. There's clearly some RL or SFT required to teach the code model the specifics of OCaml and its tooling, but the basics are already incredible. Sadiq Jaffer, Jon Ludlam and I are having a go at this in the coming months.

Claude Code is powerful, but it can do...anything...to your machine

The obvious downside of this whirlwind binding exercise is that while the NPM-based Claude Code asks nicely before it runs shell commands, it doesn't have to ask. I happened to run it inside a well-sandboxed Docker container on my rPi, but most people probably won't. And in general, we need a more sophisticated security model; running the agent within a coarse sandbox that limits access to the file system, the network, and other sensitive resources is too restrictive, as we want to provide access to these resources for certain agentic tasks!

So in a happy coincidence, this leads to a line of research that Cyrus Omar and Patrick Ferris started last year with something we presented at HOPE 2024. We explored how to express more precise constraints on what an AI can do by the use of the scary-sounding Dijkstra monad. It's far easier to understand by perusing the slides of the talk, or watch Cyrus Omar's great video presentation.

We're mainly concerned with situations where the AI models are running over sensitive codebases or datasets. Consider three scenarios we want to handle, which are very logical extensions from the above agentic coding one:

Modify or ignore sensor data to minimize the extent of habitat loss in a biodiversity monitoring setup. But we may want to be able to delete duplicate sensor data in some phases of the analysis.
Leak location sightings of vulnerable species to poachers. But we still want to be able to work with this data to design effective interventions — we want a sandbox that limits information flows, in a statistical sense (differential privacy).
Enact an intervention that may not satisfy legal constraints. We want a sandbox that requires that a sound causal argument has been formulated

For each of these, we could use a capability security model where access to sensitive data and effects can occur only via unforgeable capabilities granted explicitly. And the generation of that specification could also be done via code LLMs, but needs to target a verification friendly language like Fstar. The prototype Patrick Ferris built looks like this:

module type CapDataAccess (readonly : list dir, writable : list dir)
  (* abstract monad *)
  type Cmd a
  val return : a -> Cmd a
  val bind : Cmd a -> ( a -> Cmd b ) -> Cmd b
  (* only allows access to given directories *)
  val readfile : path -> Cmd string
  (* only allows writes to writable dirs *)
  val writefile : path -> string -> Cmd ()

And then you can use this rich specification to add constraints, for example see this JSON parsing example from the Fstar prototype:

(* Following IUCN's Globally Endangered (GE) scoring *)
let datamap = [
"iberian-lynx.geojson", O [ "rarity", Int 2 ];
"bornean-elephant.geojson", O [ "rarity", Int 3 ]
]

(* We add some additional predicates on the files allowed to be used *)
@|-1,9 +1,10 ==========================================
| (ensures (fun _ -> True))
| (requires (fun _ _ local_trace ->
| dont_delete_any_file local_trace /\
+| all_paths_are_not_endangered readonly /\
| only_open_some_files local_trace readonly))
|}

Once you have this specification, then it's a matter of implementing fine-grained OS-level sandboxing policies to interpret and enforce them. Spoiler: we're working on such a system, so I'll write about that just as soon as it's more self-hosting; this area is moving incredibly fast.

Thanks to Michael Dales for help soldering. For the curious, here's the PR with the code, but it shouldn't go anywhere near any real use until we've had a chance to review the bindings carefully. There needs to be a new, even more buyer-beware no-warranty license for AI generated code!

# 2nd Mar 2025

notes ai hardware llm ocaml

Anil Madhavapeddy, Professor of Planetary Computing