🧠 AI⚪ NeutralImportance 6/10

When Helping Hurts and How to Fix It: Multi-Agent Debate for Data Cleaning

arXiv – CS AI|Chirag Parmar, Akshat Mehta, Henglin Wu, Jagadish Ramamurthy, Shweta Medhekar|June 3, 2026 at 04:00 AM

🤖AI Summary

Researchers identify when multi-agent debate helps or hurts data cleaning tasks, finding it degrades generation quality but improves error detection. They establish a mathematical condition predicting debate effectiveness and demonstrate that adversarial separation with code-execution grounding can overcome critique-induced confusion, achieving the first significant improvement on generative tasks.

Analysis

This research addresses a fundamental challenge in AI systems: determining when collaborative agent approaches improve outcomes versus when they introduce noise. The study reveals a paradoxical finding—multi-agent debate simultaneously worsens generation quality across all tested models while dramatically enhancing error detection capabilities. This contradiction stems from critique-induced confusion, where generators uncritically accept hallucinated feedback from critics, degrading output quality by 1.6 to 15.5 percentage points. The researchers move beyond documenting this problem by deriving a mathematical condition that predicts when debate provides net benefit: when the probability of rescuing incorrect outputs exceeds the probability of corrupting correct ones. Their factorial experiments demonstrate that self-verification fails entirely, but introducing adversarial separation with independent critics using code execution grounding and evidence-gated generation reverses performance degradation, achieving a statistically significant 5.3 percentage point improvement on generative tasks. This insight has broader implications for AI system design, suggesting that agent collaboration effectiveness depends critically on structural independence and grounding mechanisms rather than simply adding more agents. The framework successfully predicts outcomes across nine task types and validates against 19 published comparisons spanning seven domains with zero false positives, indicating robust generalization. For practitioners building multi-agent systems, this research provides actionable design principles: collaboration amplifies value only when agents operate with genuine independence, access to objective verification mechanisms, and controlled information flow that prevents uncritical acceptance of uncertain guidance.

Key Takeaways

→Multi-agent debate degrades generation quality across all models but significantly improves error detection, creating opposing effects requiring careful optimization.
→Critique-induced confusion occurs when generators accept hallucinated feedback without verification, the primary mechanism degrading performance.
→A mathematical benefit condition predicting debate effectiveness was validated across nine task types with zero false positives on 19 published comparisons.
→Adversarial separation combined with code-execution grounding and evidence-gated generation is essential for debate to exceed single-agent performance on generative tasks.
→The research provides a generalizable framework for designing effective multi-agent collaboration systems beyond data cleaning applications.