Abstract. Emerging agentic AI systems have the potential to substantially accelerate progress on critical scientific and societal problems, but they also present substantial privacy, security, and safety challenges because they execute commands autonomously or semi-autonomously in environments where they have access to sensitive data and effects.
Formal methods are a key approach to enforcing mathematically rigorous safeguards that limit the ability of agentic AIs to cause these kinds of harms. In particular, we envision enforcing AI capability safety policies that impose strict constraints of various kinds on the effects of an AI agent. Specifying AI capability safety policies able to enforce these kinds of constraints in practical settings will necessarily be a large-scale, collaborative effort. In particular, it will require (1) employing a wide variety of approaches to specification and proof; (2) developing large-scale world models encompassing organizational access control and information flow models, legal models, and more general causal models of the world; and (3) developing robust AI safety policies and specifications that are likely to minimize the risk of catastrophic harm from future AI agents.
The proposed talk will outline the vision for a recently originated research project aiming to build a formally verified prototype of a foundational ``operating system'' for safeguarded AI, called Bastion, that grounds these activities within programming language theory, namely by combining dependent type theory (as a practical general-purpose theory of computational structures and proofs), Dijkstra monads (as a flexible formalism for reasoning about an AI agent’s computational effects), and abstract types (for modularizing reasoningto individual components that can be separately developed by various stakeholders, including AI safety researchers, formal methods experts, organizations of various scales, and governments seeking to develop actionable, specific policy).
Authors. Cyrus Omar, Patrick Ferris and Anil Madhavapeddy
See Also. This publication was part of the Planetary Computing project.