#speech-translation News & Analysis

10 articles tagged with #speech-translation. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

10 articles

AIBullishGoogle DeepMind Blog · Jun 97/10

🧠

Fluid, natural voice translation with Gemini 3.5 Live Translate

Google has launched Gemini 3.5 Live Translate, a near real-time speech translation feature integrated into Google AI Studio, Google Translate, and Google Meet. The technology enables fluid, natural voice translation across multiple platforms, reducing language barriers in communication.

🏢 Google🧠 Gemini

AIBullisharXiv – CS AI · May 287/10

🧠

Bandwidth-Efficient and Privacy-Preserving Edge-Cloud Many-to-Many Speech Translation

Researchers introduce ESRT, a privacy-preserving edge-cloud framework for multilingual speech-to-text translation that processes voice data locally while transmitting only compressed features to the cloud. The system achieves state-of-the-art performance across 45 languages while reducing bandwidth requirements by 10x and preventing voiceprint leakage.

AINeutralarXiv – CS AI · Jun 256/10

🧠

STEB: A Speech-to-Speech Translation Expressiveness Benchmark for Evaluating Beyond Translation Fidelity

Researchers introduced STEB, a new benchmark for evaluating speech-to-speech translation systems on both translation accuracy and emotional expressiveness preservation. Testing six systems revealed that while translation fidelity is strong, emotion and nonverbal vocalization preservation remain significant challenges, highlighting a critical gap in current AI capabilities.

AINeutralarXiv – CS AI · Jun 106/10

🧠

Speech Meets ELF: Audio Conditional Continuous-Target Diffusion for Speech Recognition and Translation

Researchers introduce ELF-S2T, a novel continuous-target generative model for speech-to-text tasks that combines audio conditioning with diffusion-based language modeling. The approach achieves competitive performance on ASR and speech translation while revealing that both tasks share common error patterns rooted in continuous latent space representations.

AINeutralarXiv – CS AI · Jun 16/10

🧠

OpenSTBench: Beyond Semantic Evaluation for Speech Translation

OpenSTBench introduces a unified evaluation framework for assessing speech translation systems across multiple dimensions including translation quality, speech quality, speaker preservation, and temporal consistency. The framework addresses a critical gap in the field by enabling comprehensive comparison of heterogeneous speech translation outputs that differ in modality and timing behavior, with code and datasets made publicly available.

AIBullisharXiv – CS AI · Jun 16/10

🧠

DOA: Training-Free Decoder-Only Attention Policy for Long-Form Simultaneous Translation with SpeechLLMs

Researchers introduce DOA (Decoder-Only Attention), a training-free method that enables simultaneous speech-to-text translation using decoder-only SpeechLLMs by extracting alignment signals from self-attention mechanisms. The approach achieves low-latency, long-form translation quality comparable to offline decoding without requiring model retraining.

AIBullishMarkTechPost · Mar 166/10

🧠

IBM AI Releases Granite 4.0 1B Speech as a Compact Multilingual Speech Model for Edge AI and Translation Pipelines

IBM has released Granite 4.0 1B Speech, a compact multilingual speech-language model optimized for automatic speech recognition and translation. The model is specifically designed for enterprise and edge deployments where memory efficiency, low latency, and compute optimization are critical alongside performance quality.

AIBullishGoogle Research Blog · Nov 196/104

🧠

Real-time speech-to-speech translation

The article discusses real-time speech-to-speech translation technology, focusing on algorithms and theoretical approaches. This represents advancement in AI-powered language processing capabilities for instant verbal communication across different languages.

AINeutralarXiv – CS AI · Mar 94/10

🧠

Better Late Than Never: Meta-Evaluation of Latency Metrics for Simultaneous Speech-to-Text Translation

Researchers developed new latency metrics YAAL and LongYAAL to better evaluate simultaneous speech-to-text translation systems, addressing structural biases in existing measurement methods. They also introduced SoftSegmenter, a resegmentation tool that enables more reliable assessment of both short- and long-form translation systems.

AINeutralarXiv – CS AI · Mar 34/104

🧠

Mitigating Structural Noise in Low-Resource S2TT: An Optimized Cascaded Nepali-English Pipeline with Punctuation Restoration

Researchers developed an optimized speech-to-text translation pipeline for Nepali-to-English that addresses punctuation loss issues in low-resource language processing. By implementing a Punctuation Restoration Module, they achieved a 4.90 BLEU point improvement over baseline systems, demonstrating significant quality gains for cascaded translation architectures.