AIBullishGoogle DeepMind Blog · 5d ago7/10
🧠Google has launched Gemini 3.5 Live Translate, a near real-time speech translation feature integrated into Google AI Studio, Google Translate, and Google Meet. The technology enables fluid, natural voice translation across multiple platforms, reducing language barriers in communication.
🏢 Google🧠 Gemini
AIBullisharXiv – CS AI · May 287/10
🧠Researchers introduce ESRT, a privacy-preserving edge-cloud framework for multilingual speech-to-text translation that processes voice data locally while transmitting only compressed features to the cloud. The system achieves state-of-the-art performance across 45 languages while reducing bandwidth requirements by 10x and preventing voiceprint leakage.
AINeutralarXiv – CS AI · 4d ago6/10
🧠Researchers introduce ELF-S2T, a novel continuous-target generative model for speech-to-text tasks that combines audio conditioning with diffusion-based language modeling. The approach achieves competitive performance on ASR and speech translation while revealing that both tasks share common error patterns rooted in continuous latent space representations.
AINeutralarXiv – CS AI · Jun 16/10
🧠OpenSTBench introduces a unified evaluation framework for assessing speech translation systems across multiple dimensions including translation quality, speech quality, speaker preservation, and temporal consistency. The framework addresses a critical gap in the field by enabling comprehensive comparison of heterogeneous speech translation outputs that differ in modality and timing behavior, with code and datasets made publicly available.
AIBullisharXiv – CS AI · Jun 16/10
🧠Researchers introduce DOA (Decoder-Only Attention), a training-free method that enables simultaneous speech-to-text translation using decoder-only SpeechLLMs by extracting alignment signals from self-attention mechanisms. The approach achieves low-latency, long-form translation quality comparable to offline decoding without requiring model retraining.
AIBullishMarkTechPost · Mar 166/10
🧠IBM has released Granite 4.0 1B Speech, a compact multilingual speech-language model optimized for automatic speech recognition and translation. The model is specifically designed for enterprise and edge deployments where memory efficiency, low latency, and compute optimization are critical alongside performance quality.
AIBullishGoogle Research Blog · Nov 196/104
🧠The article discusses real-time speech-to-speech translation technology, focusing on algorithms and theoretical approaches. This represents advancement in AI-powered language processing capabilities for instant verbal communication across different languages.
AINeutralarXiv – CS AI · Mar 94/10
🧠Researchers developed new latency metrics YAAL and LongYAAL to better evaluate simultaneous speech-to-text translation systems, addressing structural biases in existing measurement methods. They also introduced SoftSegmenter, a resegmentation tool that enables more reliable assessment of both short- and long-form translation systems.
AINeutralarXiv – CS AI · Mar 34/104
🧠Researchers developed an optimized speech-to-text translation pipeline for Nepali-to-English that addresses punctuation loss issues in low-resource language processing. By implementing a Punctuation Restoration Module, they achieved a 4.90 BLEU point improvement over baseline systems, demonstrating significant quality gains for cascaded translation architectures.