AINeutralarXiv – CS AI · 18h ago6/10
🧠
Principled Agent Debate: Adversarial Arbitration for Sycophancy Reduction in Large Language Models
Researchers present Principled Agent Debate (PAD), a multi-agent architecture that reduces sycophancy in large language models by having two models with opposing dispositions argue positions while a blind arbitrator evaluates them. Testing on 200 questions shows PAD variants achieve 48.5-53% accuracy compared to 18.5% for single models, significantly improving truthfulness over agreement bias.