44 articles tagged with #speech-recognition. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AINeutralarXiv โ CS AI ยท Mar 54/10
๐ง Researchers introduce ACES, a new method to analyze how automatic speech recognition systems perform differently across accents. The study finds that accent information is concentrated in early neural network layers and is deeply intertwined with speech recognition capabilities, making simple bias removal ineffective.
AINeutralarXiv โ CS AI ยท Mar 44/103
๐ง Research paper compares three sinusoidal models for speech and audio signal processing: standard Sinusoidal Model (SM), Exponentially Damped Sinusoidal Model (EDSM), and extended adaptive Quasi-Harmonic Model (eaQHM). The study finds eaQHM performs better for medium-to-large window analysis while EDSM excels with smaller analysis windows, suggesting future research should combine both approaches.
AINeutralarXiv โ CS AI ยท Mar 44/103
๐ง Researchers introduce Whisper-RIR-Mega, a new benchmark dataset for testing automatic speech recognition robustness in reverberant acoustic environments. The study evaluates five Whisper models and finds that reverberation consistently degrades performance across all model sizes, with word error rates increasing by 0.12 to 1.07 percentage points.
AINeutralarXiv โ CS AI ยท Mar 44/104
๐ง Researchers developed MEBM-Phoneme, a neural decoder that uses magnetoencephalography (MEG) brain signals to classify phonemes with enhanced accuracy. The system integrates multi-scale convolutional modules and attention mechanisms to improve speech perception analysis from non-invasive brain recordings.
AIBullisharXiv โ CS AI ยท Mar 44/102
๐ง Researchers developed a multistage AI approach for Bengali speech transcription and speaker diarization, achieving significant improvements in processing long-form audio recordings. The system used fine-tuned Whisper models and custom segmentation techniques to address the low-resource nature of Bengali in speech technology applications.
AIBullisharXiv โ CS AI ยท Mar 35/105
๐ง Researchers developed a multi-pass LLM post-processing system that significantly improves French clinical speech transcription accuracy by alternating between speaker recognition and word recognition passes. The system achieved significant word error rate reductions in suicide prevention conversations while maintaining stability in neurosurgery consultations with feasible computational costs for clinical deployment.
AINeutralarXiv โ CS AI ยท Feb 274/102
๐ง Researchers developed a robust framework for Bangla automatic speech recognition and speaker diarization that can handle long-form audio exceeding 30-60 seconds. The system uses Voice Activity Detection optimization and Connectionist Temporal Classification segmentation to maintain accuracy over extended durations in multi-speaker environments.
AINeutralHugging Face Blog ยท Nov 214/108
๐ง The article title suggests coverage of the Open ASR (Automatic Speech Recognition) Leaderboard, focusing on trends and insights with new multilingual and long-form evaluation tracks. However, the article body appears to be empty or not provided, limiting the ability to extract specific details about ASR developments.
AIBullishHugging Face Blog ยท May 15/106
๐ง The article appears to discuss advanced AI speech processing technologies including Automatic Speech Recognition (ASR), speaker diarization, and speculative decoding capabilities available through Hugging Face Inference Endpoints. However, the article body content is not provided for detailed analysis.
AINeutralHugging Face Blog ยท Jan 194/104
๐ง The article appears to be about fine-tuning W2V2-Bert (Wav2Vec2-BERT) for automatic speech recognition in low-resource languages using Hugging Face Transformers. However, the article body is empty, preventing detailed analysis of the technical implementation or methodology.
AIBullishHugging Face Blog ยท Dec 204/104
๐ง The article title suggests a technical advancement in Whisper inference using speculative decoding to achieve 2x faster processing speeds. However, no article body content was provided to analyze the specific implementation or implications.
AINeutralHugging Face Blog ยท Jun 194/106
๐ง The article discusses fine-tuning MMS (Massively Multilingual Speech) adapter models for automatic speech recognition (ASR) in low-resource language scenarios. This approach aims to improve speech recognition performance for languages with limited training data by leveraging pre-trained multilingual models and adapter techniques.
AINeutralHugging Face Blog ยท Nov 34/106
๐ง The article appears to discuss fine-tuning Whisper, OpenAI's automatic speech recognition model, for multilingual applications using Hugging Face Transformers library. However, the article body is empty, making detailed analysis impossible.
AINeutralHugging Face Blog ยท Feb 14/107
๐ง The article appears to discuss implementing automatic speech recognition for processing large audio files using Wav2Vec2 model in Hugging Face Transformers library. However, the article body is empty, preventing detailed analysis of the technical implementation or implications.
AINeutralHugging Face Blog ยท Jan 124/105
๐ง The article appears to discuss technical improvements to Wav2Vec2, a speech recognition model, by incorporating n-gram language models within the Hugging Face Transformers library. This represents an advancement in AI speech processing technology that could enhance accuracy and performance of speech-to-text applications.
AIBullishHugging Face Blog ยท Nov 154/106
๐ง The article appears to be about fine-tuning XLSR-Wav2Vec2, a speech recognition model, for automatic speech recognition (ASR) in low-resource languages using Hugging Face Transformers. This represents a technical advancement in AI speech processing capabilities for underserved languages.
AINeutralHugging Face Blog ยท Mar 123/103
๐ง The article appears to be about fine-tuning Wav2Vec2, a speech recognition model, for English Automatic Speech Recognition using Hugging Face's Transformers library. However, the article body is empty, making detailed analysis impossible.
AINeutralHugging Face Blog ยท Jun 21/105
๐ง The article title references AI speech recognition technology implementation in Unity, a popular game development platform. However, no article body content was provided to analyze specific details, features, or implications.
AINeutralHugging Face Blog ยท Feb 81/106
๐ง The article appears to discuss SpeechT5, a technology for speech synthesis and recognition capabilities. However, the article body provided is empty, making it impossible to analyze the specific content, implications, or technical details.