AINeutralarXiv – CS AI · 15h ago7/10
🧠
QUACK: Questioning, Understanding, and Auditing Communicated Knowledge in Multimodal Social Deduction Agents
Researchers introduce QUACK, an evaluation framework for auditing whether AI agents in social deduction games actually ground their language in perceived reality or hallucinate claims. Testing three frontier vision-language models reveals that even top performers hallucinate 15% of spatial claims and make accusations without evidence, exposing critical gaps in agent reasoning reliability.