y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#asr News & Analysis

25 articles tagged with #asr. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

25 articles
AIBearisharXiv โ€“ CS AI ยท Mar 276/10
๐Ÿง 

Back to Basics: Revisiting ASR in the Age of Voice Agents

Researchers introduced WildASR, a multilingual diagnostic benchmark revealing that current ASR systems suffer severe performance degradation in real-world conditions despite achieving near-human accuracy on curated tests. The study found that ASR models often hallucinate plausible but unspoken content under degraded inputs, creating safety risks for voice agents.

AIBullishMarkTechPost ยท Mar 266/10
๐Ÿง 

Cohere AI Releases Cohere Transcribe: A SOTA Automatic Speech Recognition (ASR) Model Powering Enterprise Speech Intelligence

Cohere AI has released Cohere Transcribe, a new state-of-the-art Automatic Speech Recognition (ASR) model designed for enterprise applications. This marks the company's expansion beyond text generation and embedding models into the speech recognition market, targeting enterprise speech intelligence solutions.

Cohere AI Releases Cohere Transcribe: A SOTA Automatic Speech Recognition (ASR) Model Powering Enterprise Speech Intelligence
๐Ÿข Cohere
AIBullishMarkTechPost ยท Mar 176/10
๐Ÿง 

Google AI Releases WAXAL: A Multilingual African Speech Dataset for Training Automatic Speech Recognition and Text-to-Speech Models

Google AI has released WAXAL, an open multilingual speech dataset covering 24 African languages to improve Automatic Speech Recognition and Text-to-Speech systems. This addresses the significant data distribution problem where African languages remain poorly represented in speech technology training corpora.

Google AI Releases WAXAL: A Multilingual African Speech Dataset for Training Automatic Speech Recognition and Text-to-Speech Models
๐Ÿข Google
AIBullisharXiv โ€“ CS AI ยท Mar 116/10
๐Ÿง 

Latent Speech-Text Transformer

Facebook Research introduces the Latent Speech-Text Transformer (LST), which aggregates speech tokens into higher-level patches to improve computational efficiency and cross-modal alignment. The model achieves up to +6.5% absolute gain on speech HellaSwag benchmarks while maintaining text performance and reducing inference costs for ASR and TTS tasks.

AIBullisharXiv โ€“ CS AI ยท Mar 45/103
๐Ÿง 

GLoRIA: Gated Low-Rank Interpretable Adaptation for Dialectal ASR

Researchers developed GLoRIA, a parameter-efficient framework for automatic speech recognition that adapts to regional dialects using location metadata. The system achieves state-of-the-art performance while updating less than 10% of model parameters and demonstrates strong generalization to unseen dialects.

AIBullisharXiv โ€“ CS AI ยท Mar 37/107
๐Ÿง 

Whisper-MLA: Reducing GPU Memory Consumption of ASR Models based on MHA2MLA Conversion

Researchers introduce Whisper-MLA, a modified version of OpenAI's Whisper speech recognition model that uses Multi-Head Latent Attention to reduce GPU memory consumption by up to 87.5% while maintaining accuracy. The innovation addresses a key scalability issue with transformer-based ASR models when processing long-form audio.

AIBullisharXiv โ€“ CS AI ยท Mar 26/1015
๐Ÿง 

Audio-Conditioned Diffusion LLMs for ASR and Deliberation Processing

Researchers developed Whisper-LLaDA, a diffusion-based large language model for automatic speech recognition that achieves 12.3% relative improvement over baseline models. The study demonstrates that audio-conditioned embeddings are crucial for accuracy improvements, while plain-text processing without acoustic features fails to enhance performance.

AIBullisharXiv โ€“ CS AI ยท Feb 276/107
๐Ÿง 

Efficient Dialect-Aware Modeling and Conditioning for Low-Resource Taiwanese Hakka Speech Processing

Researchers developed a new AI framework using RNN-T architecture to improve speech recognition for Taiwanese Hakka, an endangered low-resource language with high dialectal variability. The system achieved 57% and 40% relative error rate reductions for two different writing systems, marking the first systematic investigation into Hakka dialect variations in ASR.

AIBullisharXiv โ€“ CS AI ยท Mar 175/10
๐Ÿง 

Speech Recognition on TV Series with Video-guided Post-ASR Correction

Researchers have developed a Video-Guided Post-ASR Correction (VPC) framework that uses Video-Large Multimodal Models to improve speech recognition accuracy in complex environments like TV series. The system addresses challenges with multiple speakers, overlapping speech, and domain-specific terminology by leveraging video context to refine ASR outputs.

AINeutralarXiv โ€“ CS AI ยท Mar 175/10
๐Ÿง 

Variational Low-Rank Adaptation for Personalized Impaired Speech Recognition

Researchers developed a novel Bayesian Low-rank Adaptation method for personalizing automatic speech recognition systems to better understand impaired speech. The approach addresses challenges in ASR systems like Whisper that struggle with non-normative speech patterns from conditions like cerebral palsy, using data-efficient fine-tuning on English and German datasets.

AINeutralarXiv โ€“ CS AI ยท Mar 44/103
๐Ÿง 

Whisper-RIR-Mega: A Paired Clean-Reverberant Speech Benchmark for ASR Robustness to Room Acoustics

Researchers introduce Whisper-RIR-Mega, a new benchmark dataset for testing automatic speech recognition robustness in reverberant acoustic environments. The study evaluates five Whisper models and finds that reverberation consistently degrades performance across all model sizes, with word error rates increasing by 0.12 to 1.07 percentage points.

AIBullisharXiv โ€“ CS AI ยท Mar 44/102
๐Ÿง 

An Investigation Into Various Approaches For Bengali Long-Form Speech Transcription and Bengali Speaker Diarization

Researchers developed a multistage AI approach for Bengali speech transcription and speaker diarization, achieving significant improvements in processing long-form audio recordings. The system used fine-tuned Whisper models and custom segmentation techniques to address the low-resource nature of Bengali in speech technology applications.

AINeutralarXiv โ€“ CS AI ยท Mar 34/104
๐Ÿง 

Mitigating Structural Noise in Low-Resource S2TT: An Optimized Cascaded Nepali-English Pipeline with Punctuation Restoration

Researchers developed an optimized speech-to-text translation pipeline for Nepali-to-English that addresses punctuation loss issues in low-resource language processing. By implementing a Punctuation Restoration Module, they achieved a 4.90 BLEU point improvement over baseline systems, demonstrating significant quality gains for cascaded translation architectures.

AINeutralarXiv โ€“ CS AI ยท Feb 274/102
๐Ÿง 

A Holistic Framework for Robust Bangla ASR and Speaker Diarization with Optimized VAD and CTC Alignment

Researchers developed a robust framework for Bangla automatic speech recognition and speaker diarization that can handle long-form audio exceeding 30-60 seconds. The system uses Voice Activity Detection optimization and Connectionist Temporal Classification segmentation to maintain accuracy over extended durations in multi-speaker environments.

AINeutralHugging Face Blog ยท Nov 214/108
๐Ÿง 

Open ASR Leaderboard: Trends and Insights with New Multilingual & Long-Form Tracks

The article title suggests coverage of the Open ASR (Automatic Speech Recognition) Leaderboard, focusing on trends and insights with new multilingual and long-form evaluation tracks. However, the article body appears to be empty or not provided, limiting the ability to extract specific details about ASR developments.

AIBullishHugging Face Blog ยท May 15/106
๐Ÿง 

Powerful ASR + diarization + speculative decoding with Hugging Face Inference Endpoints

The article appears to discuss advanced AI speech processing technologies including Automatic Speech Recognition (ASR), speaker diarization, and speculative decoding capabilities available through Hugging Face Inference Endpoints. However, the article body content is not provided for detailed analysis.

AINeutralHugging Face Blog ยท Jan 194/104
๐Ÿง 

Fine-Tune W2V2-Bert for low-resource ASR with ๐Ÿค— Transformers

The article appears to be about fine-tuning W2V2-Bert (Wav2Vec2-BERT) for automatic speech recognition in low-resource languages using Hugging Face Transformers. However, the article body is empty, preventing detailed analysis of the technical implementation or methodology.

AINeutralHugging Face Blog ยท Jun 194/106
๐Ÿง 

Fine-Tune MMS Adapter Models for low-resource ASR

The article discusses fine-tuning MMS (Massively Multilingual Speech) adapter models for automatic speech recognition (ASR) in low-resource language scenarios. This approach aims to improve speech recognition performance for languages with limited training data by leveraging pre-trained multilingual models and adapter techniques.

AINeutralHugging Face Blog ยท Nov 34/106
๐Ÿง 

Fine-Tune Whisper For Multilingual ASR with ๐Ÿค— Transformers

The article appears to discuss fine-tuning Whisper, OpenAI's automatic speech recognition model, for multilingual applications using Hugging Face Transformers library. However, the article body is empty, making detailed analysis impossible.

AIBullishHugging Face Blog ยท Nov 154/106
๐Ÿง 

Fine-Tune XLSR-Wav2Vec2 for low-resource ASR with ๐Ÿค— Transformers

The article appears to be about fine-tuning XLSR-Wav2Vec2, a speech recognition model, for automatic speech recognition (ASR) in low-resource languages using Hugging Face Transformers. This represents a technical advancement in AI speech processing capabilities for underserved languages.