AIBearisharXiv – CS AI · 15h ago7/10
🧠
A Universal Cliff and a Design Fingerprint: Cross-Section Defect Detection Under LLM Orchestration
Researchers discovered that large language models fail catastrophically at detecting contradictions spanning multiple sections of documents when using multi-agent orchestration systems, despite performing well in single-agent scenarios. The detection failure is universal across model families and generations, and alignment improvements don't fix the structural problem—creating a critical vulnerability in production LLM systems.