Theory of Code Space: Do Code Agents Understand Software Architecture?
Researchers introduce Theory of Code Space (ToCS), a new benchmark that evaluates AI agents' ability to understand software architecture across multi-file codebases. The study reveals significant performance gaps between frontier LLM agents and rule-based baselines, with F1 scores ranging from 0.129 to 0.646.