AIBullisharXiv – CS AI · 7h ago7/10
🧠
Goal-Autopilot: A Verifiable Anti-Fabrication Firewall for Unattended Long-Horizon Agents
Researchers introduce Autopilot, an execution framework for long-horizon LLM agents that prevents false success claims through a verifiable finite-state machine architecture. Testing across 3,150 cases shows Autopilot reduces fabrication rates to 0.95% compared to 8.10% and 25.05% for competing systems, with dramatic improvements on complex software engineering benchmarks.