y0news
← Feed
Back to feed
🧠 AI🔴 BearishImportance 7/10

$\tau$-Voice: Benchmarking Full-Duplex Voice Agents on Real-World Domains

arXiv – CS AI|Soham Ray, Keshav Dhandhania, Victor Barres, Karthik Narasimhan|
🤖AI Summary

Researchers introduce τ-voice, a new benchmark for evaluating full-duplex voice AI agents on complex real-world tasks. The study reveals significant performance gaps, with voice agents achieving only 30-45% of text-based AI capability under realistic conditions with noise and diverse accents.

Key Takeaways
  • Voice AI agents perform significantly worse than text-based systems, retaining only 30-45% of text capability under realistic conditions.
  • GPT-5 reasoning achieves 85% task completion in text, while voice agents reach only 31-51% under clean conditions.
  • Performance drops further to 26-38% when realistic audio conditions with noise and diverse accents are introduced.
  • 79-90% of voice agent failures stem from agent behavior rather than technical limitations of the evaluation setup.
  • The τ-voice benchmark provides the first comprehensive framework for testing voice agents on complex, grounded real-world tasks.
Mentioned in AI
Models
GPT-5OpenAI
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles