#voice-ai News & Analysis

30 articles tagged with #voice-ai. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

30 articles

AIBearisharXiv – CS AI · Jun 237/10

🧠

Sexualised synthetic personas encode and amplify gendered power asymmetries through voice

A research study examines how commercial AI voice platforms reproduce gendered power asymmetries, finding that female-coded voices are consistently described with sexualized and submissive language while male-coded voices receive associations with dominance and positive traits. The research reveals AI systems amplify narrow, binary, and heteronormative gender performances rather than enabling genuine diversity.

AIBullisharXiv – CS AI · Jun 237/10

🧠

CORTIS: Text-Only Adaptation of Spoken Language Models for Task-Oriented Voice Agents

Researchers introduce CORTIS, a framework that enables spoken language models (SLMs) to handle task-oriented voice agent functions using only text-based training data, eliminating the need for expensive paired speech-target annotations. The approach matches or outperforms traditional ASR-LLM cascades while demonstrating superior robustness under acoustic degradation.

AIBullishCrypto Briefing · Jun 237/10

🧠

OpenAI prepares ChatGPT voice upgrade with Bidi 1 model

OpenAI is developing the GPT-Bidi-1 model designed to enhance ChatGPT's voice capabilities with improved real-time conversational fluidity and adaptability. This advancement represents a significant upgrade to AI voice interaction technology that could reshape how users engage with conversational AI systems.

🏢 OpenAI🧠 ChatGPT

AI × CryptoBullishCrypto Briefing · Jun 97/10

🤖

Deepgram partners with Fortanix and Nvidia for secure voice AI deployment in regulated industries

Deepgram has partnered with Fortanix and Nvidia to deploy secure voice AI solutions in regulated industries, addressing the critical need for privacy-preserving AI in data-sensitive sectors. This collaboration enables advanced AI capabilities while maintaining compliance and data protection standards.

🏢 Nvidia

AIBullisharXiv – CS AI · Jun 97/10

🧠

Audio-FLAN: An Instruction-Following Dataset for Unified Audio Understanding and Generation of Speech, Music, and Sound

Researchers introduce Audio-FLAN, a large-scale instruction-tuning dataset with over 100 million instances covering 80 diverse tasks across speech, music, and sound domains. This dataset addresses a critical gap in unified audio-language models by enabling both audio understanding and generation tasks, advancing the integration of audio capabilities into large language models.

🏢 Hugging Face

AIBullishDecrypt – AI · May 267/10

🧠

StepFun's Voice AI Topped Every Benchmark. It Also Hears Your Sighs

StepFun, a Shanghai-based AI lab known for developing efficient large language models, has achieved top benchmark results in voice AI technology with notable sensitivity to acoustic nuances like sighs. The breakthrough demonstrates the lab's capability to extend its LLM expertise into multimodal AI, potentially reshaping voice recognition and AI assistant markets.

AI × CryptoBullishCrypto Briefing · May 97/10

🤖

OpenAI unveils GPT-5-class voice models for real-time orchestration

OpenAI has released GPT-5-class voice models designed for real-time orchestration, which could significantly impact cryptocurrency markets and decentralized computing infrastructure. The modular voice AI tools are positioned to drive innovation and investment in AI infrastructure sectors, with potential implications for how decentralized systems handle computational tasks.

🏢 OpenAI🧠 GPT-5

AIBullishOpenAI News · May 77/10

🧠

Advancing voice intelligence with new models in the API

OpenAI has introduced new realtime voice models in its API that enable advanced capabilities including reasoning, translation, and speech transcription. These models represent a significant step toward more natural and intelligent voice-based interactions, expanding the practical applications available to developers building voice-enabled applications.

🏢 OpenAI

AIBullishOpenAI News · May 47/10

🧠

How OpenAI delivers low-latency voice AI at scale

OpenAI has rebuilt its WebRTC infrastructure to enable real-time voice AI conversations with minimal latency and global scalability. The technical achievement demonstrates a significant advancement in conversational AI systems that can maintain natural turn-taking dynamics while serving users worldwide.

🏢 OpenAI

AIBearisharXiv – CS AI · Mar 177/10

🧠

$\tau$-Voice: Benchmarking Full-Duplex Voice Agents on Real-World Domains

Researchers introduce τ-voice, a new benchmark for evaluating full-duplex voice AI agents on complex real-world tasks. The study reveals significant performance gaps, with voice agents achieving only 30-45% of text-based AI capability under realistic conditions with noise and diverse accents.

🧠 GPT-5

AIBullishOpenAI News · Oct 17/105

🧠

Introducing the Realtime API

OpenAI has launched a new Realtime API that enables developers to integrate fast speech-to-speech capabilities directly into their applications. This API allows for real-time voice interactions without the traditional delays of converting speech to text and back to speech.

AINeutralarXiv – CS AI · Jun 256/10

🧠

SpeechEQ: Benchmarking Emotional Intelligence Quotient in Socially Aware Voice Conversational Models

Researchers introduce SpeechEQ, a benchmarking framework that evaluates how well voice-based AI models understand emotional intelligence through multi-turn dialogue. The dataset of 2,265 dialogues reveals that current speech-language models fail to fully process paralinguistic cues, relying instead on text shortcuts and exhibiting contextual memory gaps.

🏢 Hugging Face

AINeutralarXiv – CS AI · Jun 106/10

🧠

Towards Robust Arabic Speech Emotion Recognition with Deep Learning

Researchers propose a CNN-Transformer hybrid architecture for Arabic Speech Emotion Recognition that achieves 98.1% accuracy, outperforming CNN-LSTM and fine-tuned wav2vec 2.0 models. The study addresses the underexplored challenge of emotion detection in Arabic speech by combining convolutional feature extraction with Transformer-based context modeling, demonstrating effectiveness in low-resource, dialectally diverse settings.

AINeutralHugging Face Blog · Jun 96/10

🧠

Can Voice Agents Handle Bilingual Customers? Benchmarking Frontier ASR on Code-Switched Speech

Researchers benchmark frontier automatic speech recognition (ASR) systems on code-switched speech, where bilingual speakers mix languages mid-conversation. The study evaluates how well modern voice AI handles this common real-world scenario, revealing performance gaps that matter for customer service applications.

AINeutralarXiv – CS AI · Jun 86/10

🧠

IRAF: Interference-Resilient Adaptive Fusion for Noise-Robust End-to-End Full-Duplex Spoken Dialogue Systems

Researchers propose IRAF, a lightweight module that improves full-duplex spoken dialogue systems by filtering interference from background speakers. The technology uses adaptive fusion to modulate user audio reliability frame-by-frame, demonstrating improved response quality and stable turn-taking in noisy acoustic environments.

AIBullishTechCrunch – AI · Jun 36/10

🧠

These two founders left Goldman and Meta to build voice AI for markets everyone else overlooked

Former Goldman Sachs and Meta executives have launched a voice AI startup targeting underserved markets in Africa and the Middle East, now processing over 17,000 calls daily through their proprietary technology stack. The venture addresses a significant gap in AI infrastructure for emerging markets where traditional financial services remain limited.

AIBullishBlockonomi · May 296/10

🧠

Alibaba Voice AI Model Beats OpenAI and xAI on Global Benchmark

Alibaba's Fun-Realtime-TTS-Preview voice AI model ranked fifth on the Artificial Analysis Speech Arena leaderboard, outperforming systems from OpenAI and xAI. This achievement marks Alibaba as the only Chinese-engineered voice system in the global top five, supporting 30+ languages and multiple Chinese dialects.

🏢 OpenAI🏢 xAI

AINeutralTechCrunch – AI · May 106/10

🧠

Voice AI in India is hard. Wispr Flow is betting on it anyway.

Wispr Flow has accelerated growth in India following its Hinglish language rollout, demonstrating market demand for voice AI solutions in regional languages. However, the company operates within a challenging landscape where voice AI products face significant technical and adoption hurdles across the Indian market.

AIBullishBlockonomi · May 26/10

🧠

SoundHound AI (SOUN) Stock Rallies 20% Following Twilio’s Bullish Voice AI Results

SoundHound AI (SOUN) stock surged 20.1% following positive voice AI results reported by competitor Twilio, capitalizing on market enthusiasm for the voice AI sector. The rally comes ahead of SOUN's own Q1 earnings announcement scheduled for Thursday, which could provide additional catalyst for the stock's momentum.

AIBullishBlockonomi · Apr 216/10

🧠

SoundHound AI (SOUN) Stock Climbs 3% Despite Broader Tech Sector Weakness

SoundHound AI (SOUN) gained 3% on Monday despite a broader technology sector decline triggered by U.S.-Iran geopolitical tensions. The stock's resilience reflects strong fundamental performance, with the company reporting 59.4% year-over-year revenue growth and analyst price targets exceeding $14.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Efficient Training for Cross-lingual Speech Language Models

Researchers introduce Cross-lingual Speech Language Models (CSLM), an efficient training method for building multilingual speech AI systems using discrete speech tokens. The approach achieves cross-modal and cross-lingual alignment through continual pre-training and instruction fine-tuning, enabling effective speech LLMs without requiring massive datasets.

AIBullisharXiv – CS AI · Mar 126/10

🧠

Speaker Verification with Speech-Aware LLMs: Evaluation and Augmentation

Researchers developed a protocol to evaluate speaker verification capabilities in speech-aware large language models, finding weak performance with error rates above 20%. They introduced ECAPA-LLM, a lightweight augmentation that achieves 1.03% error rate by integrating speaker embeddings while maintaining natural language interface.

AIBullishWired – AI · Mar 37/106

🧠

This AI Agent Is Ready to Serve, Mid-Phone Call

Deutsche Telekom is partnering with ElevenLabs to integrate AI assistant functionality directly into phone calls across its German network without requiring any app installation. This represents a significant step toward mainstream AI integration in telecommunications infrastructure.

AINeutralarXiv – CS AI · Mar 26/1013

🧠

Human or Machine? A Preliminary Turing Test for Speech-to-Speech Interaction

Researchers conducted the first Turing test for speech-to-speech AI systems, analyzing 2,968 human judgments across 9 state-of-the-art systems. No current S2S system passed the test, with failures primarily stemming from paralinguistic features and emotional expressivity rather than semantic understanding.

AIBullishOpenAI News · Jan 206/104

🧠

ServiceNow powers actionable enterprise AI with OpenAI

ServiceNow is expanding its integration with OpenAI to bring advanced AI capabilities to enterprise workflows. The partnership will enable AI-driven summarization, search, and voice features across ServiceNow's platform to enhance business operations.

Page 1 of 2Next →