AIBearisharXiv – CS AI · 10h ago7/10
🧠
Insider Attacks in Multi-Agent LLM Consensus Systems
Researchers demonstrate that malicious agents within multi-agent LLM consensus systems can effectively disrupt agreement formation through sophisticated insider attacks. Using reinforcement learning trained on surrogate world models, attackers significantly reduce consensus rates among benign agents, revealing a critical vulnerability in decentralized AI systems that assume participant alignment.