🧠 AI🔴 BearishImportance 7/10

$\tau$-Voice: Benchmarking Full-Duplex Voice Agents on Real-World Domains

arXiv – CS AI|Soham Ray, Keshav Dhandhania, Victor Barres, Karthik Narasimhan|March 17, 2026 at 04:00 AM

🤖AI Summary

Researchers introduce τ-voice, a new benchmark for evaluating full-duplex voice AI agents on complex real-world tasks. The study reveals significant performance gaps, with voice agents achieving only 30-45% of text-based AI capability under realistic conditions with noise and diverse accents.

Key Takeaways

→Voice AI agents perform significantly worse than text-based systems, retaining only 30-45% of text capability under realistic conditions.
→GPT-5 reasoning achieves 85% task completion in text, while voice agents reach only 31-51% under clean conditions.
→Performance drops further to 26-38% when realistic audio conditions with noise and diverse accents are introduced.
→79-90% of voice agent failures stem from agent behavior rather than technical limitations of the evaluation setup.
→The τ-voice benchmark provides the first comprehensive framework for testing voice agents on complex, grounded real-world tasks.

Mentioned in AI

Models

GPT-5OpenAI