11 articles tagged with #voice-ai. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBearisharXiv โ CS AI ยท Mar 177/10
๐ง Researchers introduce ฯ-voice, a new benchmark for evaluating full-duplex voice AI agents on complex real-world tasks. The study reveals significant performance gaps, with voice agents achieving only 30-45% of text-based AI capability under realistic conditions with noise and diverse accents.
๐ง GPT-5
AIBullishOpenAI News ยท Oct 17/105
๐ง OpenAI has launched a new Realtime API that enables developers to integrate fast speech-to-speech capabilities directly into their applications. This API allows for real-time voice interactions without the traditional delays of converting speech to text and back to speech.
AINeutralarXiv โ CS AI ยท 2d ago6/10
๐ง Researchers introduce Cross-lingual Speech Language Models (CSLM), an efficient training method for building multilingual speech AI systems using discrete speech tokens. The approach achieves cross-modal and cross-lingual alignment through continual pre-training and instruction fine-tuning, enabling effective speech LLMs without requiring massive datasets.
AIBullisharXiv โ CS AI ยท Mar 126/10
๐ง Researchers developed a protocol to evaluate speaker verification capabilities in speech-aware large language models, finding weak performance with error rates above 20%. They introduced ECAPA-LLM, a lightweight augmentation that achieves 1.03% error rate by integrating speaker embeddings while maintaining natural language interface.
AIBullishWired โ AI ยท Mar 37/106
๐ง Deutsche Telekom is partnering with ElevenLabs to integrate AI assistant functionality directly into phone calls across its German network without requiring any app installation. This represents a significant step toward mainstream AI integration in telecommunications infrastructure.
AINeutralarXiv โ CS AI ยท Mar 26/1013
๐ง Researchers conducted the first Turing test for speech-to-speech AI systems, analyzing 2,968 human judgments across 9 state-of-the-art systems. No current S2S system passed the test, with failures primarily stemming from paralinguistic features and emotional expressivity rather than semantic understanding.
AIBullishOpenAI News ยท Jan 206/104
๐ง ServiceNow is expanding its integration with OpenAI to bring advanced AI capabilities to enterprise workflows. The partnership will enable AI-driven summarization, search, and voice features across ServiceNow's platform to enhance business operations.
AIBullishOpenAI News ยท Jan 76/105
๐ง Tolan has developed a voice-first AI companion using GPT-5.1 technology, featuring low-latency responses and real-time context reconstruction. The system incorporates memory-driven personalities to enable more natural conversational experiences.
AIBullishGoogle DeepMind Blog ยท Dec 126/105
๐ง Google has announced improvements to its Gemini audio models, enhancing voice interaction capabilities for more powerful and natural voice experiences. The upgrades focus on better audio processing and response quality in conversational AI applications.
AINeutralOpenAI News ยท Jun 75/107
๐ง OpenAI provides technical insights into Voice Engine, their text-to-speech model technology, along with details about their safety research approach. The article explores the underlying technology and safety considerations for their voice synthesis capabilities.
AINeutralarXiv โ CS AI ยท Mar 54/10
๐ง Researchers propose ZeSTA, a domain-conditioned training framework that improves personalized speech synthesis by better integrating synthetic and real speech data. The method addresses speaker similarity degradation issues when using zero-shot text-to-speech augmentation with limited real recordings.