Towards security specifications for agentic AIs

/ Notes / Towards security specifications for agentic AIs

This note was published on 4th Sep 2024.

A very fun talk at ACM HOPE 2024 on some new work with Cyrus Omar and Patrick Ferris on how we can formally specify systems to be robust to code generation by AI agents. For instance, if you were to ask GitHub Copilot to generate you code to filter endangered animals out of a folder of images, it might interpret that as to delete the image, or to move it to another folder (which might be public), or just remove it from the index. Any of those options are potentially valid, so what do we do? Our idea is to use F* to specify a rich set of allowable behaviours which can then be dynamically enforced in less expressive languages, and thus offer layers of protection against over-eager (or rogue) AI agents.

We'll increasingly need these sort of protections in our systems as natural language interfaces get adopted for programming, and the traditional 'yes or not' access control policies we use right now will be insufficient.