$\tau$-Voice: Benchmarking Full-Duplex Voice Agents on Real-World Domains
Researchers introduce ฯ-voice, a new benchmark for evaluating full-duplex voice AI agents on complex real-world tasks. The study reveals significant performance gaps, with voice agents achieving only 30-45% of text-based AI capability under realistic conditions with noise and diverse accents.