y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#voice-ai News & Analysis

11 articles tagged with #voice-ai. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

11 articles
AIBearisharXiv โ€“ CS AI ยท Mar 177/10
๐Ÿง 

$\tau$-Voice: Benchmarking Full-Duplex Voice Agents on Real-World Domains

Researchers introduce ฯ„-voice, a new benchmark for evaluating full-duplex voice AI agents on complex real-world tasks. The study reveals significant performance gaps, with voice agents achieving only 30-45% of text-based AI capability under realistic conditions with noise and diverse accents.

๐Ÿง  GPT-5
AIBullishOpenAI News ยท Oct 17/105
๐Ÿง 

Introducing the Realtime API

OpenAI has launched a new Realtime API that enables developers to integrate fast speech-to-speech capabilities directly into their applications. This API allows for real-time voice interactions without the traditional delays of converting speech to text and back to speech.

AINeutralarXiv โ€“ CS AI ยท 2d ago6/10
๐Ÿง 

Efficient Training for Cross-lingual Speech Language Models

Researchers introduce Cross-lingual Speech Language Models (CSLM), an efficient training method for building multilingual speech AI systems using discrete speech tokens. The approach achieves cross-modal and cross-lingual alignment through continual pre-training and instruction fine-tuning, enabling effective speech LLMs without requiring massive datasets.

AIBullisharXiv โ€“ CS AI ยท Mar 126/10
๐Ÿง 

Speaker Verification with Speech-Aware LLMs: Evaluation and Augmentation

Researchers developed a protocol to evaluate speaker verification capabilities in speech-aware large language models, finding weak performance with error rates above 20%. They introduced ECAPA-LLM, a lightweight augmentation that achieves 1.03% error rate by integrating speaker embeddings while maintaining natural language interface.

AIBullishWired โ€“ AI ยท Mar 37/106
๐Ÿง 

This AI Agent Is Ready to Serve, Mid-Phone Call

Deutsche Telekom is partnering with ElevenLabs to integrate AI assistant functionality directly into phone calls across its German network without requiring any app installation. This represents a significant step toward mainstream AI integration in telecommunications infrastructure.

This AI Agent Is Ready to Serve, Mid-Phone Call
AINeutralarXiv โ€“ CS AI ยท Mar 26/1013
๐Ÿง 

Human or Machine? A Preliminary Turing Test for Speech-to-Speech Interaction

Researchers conducted the first Turing test for speech-to-speech AI systems, analyzing 2,968 human judgments across 9 state-of-the-art systems. No current S2S system passed the test, with failures primarily stemming from paralinguistic features and emotional expressivity rather than semantic understanding.

AIBullishOpenAI News ยท Jan 206/104
๐Ÿง 

ServiceNow powers actionable enterprise AI with OpenAI

ServiceNow is expanding its integration with OpenAI to bring advanced AI capabilities to enterprise workflows. The partnership will enable AI-driven summarization, search, and voice features across ServiceNow's platform to enhance business operations.

AIBullishOpenAI News ยท Jan 76/105
๐Ÿง 

How Tolan builds voice-first AI with GPT-5.1

Tolan has developed a voice-first AI companion using GPT-5.1 technology, featuring low-latency responses and real-time context reconstruction. The system incorporates memory-driven personalities to enable more natural conversational experiences.

AIBullishGoogle DeepMind Blog ยท Dec 126/105
๐Ÿง 

Improved Gemini audio models for powerful voice experiences

Google has announced improvements to its Gemini audio models, enhancing voice interaction capabilities for more powerful and natural voice experiences. The upgrades focus on better audio processing and response quality in conversational AI applications.

AINeutralOpenAI News ยท Jun 75/107
๐Ÿง 

Expanding on how Voice Engine works and our safety research

OpenAI provides technical insights into Voice Engine, their text-to-speech model technology, along with details about their safety research approach. The article explores the underlying technology and safety considerations for their voice synthesis capabilities.