#whisper News & Analysis

17 articles tagged with #whisper. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

17 articles

AIBullisharXiv – CS AI · Jun 117/10

🧠

Towards Data-free and Training-free Compression for Speech Foundation Models Using Parameter Clustering

Researchers present a novel compression technique for speech foundation models using parameter clustering and k-means pruning without requiring training data or fine-tuning. The method demonstrates significant performance improvements over traditional magnitude-based pruning on HuBERT-large and Whisper-large-v3, with 27-59% relative WER reductions at various sparsity levels.

AIBullisharXiv – CS AI · Jun 107/10

🧠

Whisfusion: Parallel ASR Decoding with Masked Diffusion

Whisfusion introduces a masked diffusion decoder that achieves faster speech-to-text processing than Whisper-large-v3 while matching or exceeding its accuracy across multilingual benchmarks. By replacing autoregressive decoding with parallel diffusion decoding, the system runs 4-5x faster while maintaining competitive performance with leading ASR systems, establishing non-autoregressive diffusion as a viable paradigm for high-throughput transcription.

AIBullisharXiv – CS AI · Jun 27/10

🧠

ASKD-Whisper: Adaptive Self-knowledge Distillation for Efficient and Low-Latency Automatic Speech Recognition

Researchers propose ASKD-Whisper, a new knowledge distillation technique that compresses OpenAI's Whisper speech recognition model while improving performance. The method achieves 5x faster inference and 1.07% lower error rates than the original teacher model by dynamically reducing reliance on the teacher's predictions during training.

AIBullishOpenAI News · Apr 247/105

🧠

GPT-4 API general availability and deprecation of older models in the Completions API

OpenAI has made GPT-4 API generally available alongside GPT-3.5 Turbo, DALL·E, and Whisper APIs. The company announced a deprecation plan for older Completions API models, which will be retired at the beginning of 2024.

AIBullishOpenAI News · Apr 247/106

🧠

Introducing ChatGPT and Whisper APIs

OpenAI has released APIs for ChatGPT and Whisper models, allowing developers to integrate these AI capabilities directly into their applications and products. This marks a significant step in making advanced conversational AI and speech recognition technology accessible to third-party developers.

AIBullishOpenAI News · Sep 217/107

🧠

Introducing Whisper

OpenAI has trained and open-sourced Whisper, a neural network for speech recognition that achieves human-level robustness and accuracy on English speech. The model represents a significant advancement in AI speech recognition technology and is being made freely available to the community.

AINeutralarXiv – CS AI · Jun 26/10

🧠

VocSim: A Training-free Benchmark for Zero-shot Content Identity in Single-source Audio

Researchers introduce VocSim, a training-free benchmark for evaluating audio embeddings' ability to identify content across diverse sound sources without parameter updates or labeled data. Testing 125k clips spanning speech, animal vocalizations, and environmental sounds, the study reveals that while frozen Whisper embeddings perform well overall, significant generalization gaps exist for low-resource and non-English languages, with implications for audio AI model development.

AINeutralarXiv – CS AI · May 295/10

🧠

Transcribing Children's Speech: ASR Performance and Obtaining Reliable Orthographic Transcriptions

Researchers evaluated nine automatic speech recognition (ASR) models on Dutch child speech datasets, finding that fine-tuned Whisper-medium achieved 5.54% word error rate on clean data but 70.37% on noisy data. Using an utterance-level selection method, they identified 42% of clean recordings as reliable without manual verification, achieving 98.3% precision and significantly reducing annotation overhead for child speech research.

AIBullisharXiv – CS AI · Mar 37/107

🧠

Whisper-MLA: Reducing GPU Memory Consumption of ASR Models based on MHA2MLA Conversion

Researchers introduce Whisper-MLA, a modified version of OpenAI's Whisper speech recognition model that uses Multi-Head Latent Attention to reduce GPU memory consumption by up to 87.5% while maintaining accuracy. The innovation addresses a key scalability issue with transformer-based ASR models when processing long-form audio.

AIBullisharXiv – CS AI · Mar 26/1015

🧠

Audio-Conditioned Diffusion LLMs for ASR and Deliberation Processing

Researchers developed Whisper-LLaDA, a diffusion-based large language model for automatic speech recognition that achieves 12.3% relative improvement over baseline models. The study demonstrates that audio-conditioned embeddings are crucial for accuracy improvements, while plain-text processing without acoustic features fails to enhance performance.

AINeutralarXiv – CS AI · Mar 264/10

🧠

From Oracle to Noisy Context: Mitigating Contextual Exposure Bias in Speech-LLMs

Researchers developed a new training framework to address contextual exposure bias in Speech-LLMs, where models trained on perfect conversation history perform poorly with error-prone real-world context. Their approach combines teacher error knowledge, context dropout, and direct preference optimization to improve robustness, achieving WER reductions from 5.59% to 5.17% on TED-LIUM 3.

AINeutralarXiv – CS AI · Mar 44/103

🧠

Whisper-RIR-Mega: A Paired Clean-Reverberant Speech Benchmark for ASR Robustness to Room Acoustics

Researchers introduce Whisper-RIR-Mega, a new benchmark dataset for testing automatic speech recognition robustness in reverberant acoustic environments. The study evaluates five Whisper models and finds that reverberation consistently degrades performance across all model sizes, with word error rates increasing by 0.12 to 1.07 percentage points.

AIBullisharXiv – CS AI · Mar 44/102

🧠

An Investigation Into Various Approaches For Bengali Long-Form Speech Transcription and Bengali Speaker Diarization

Researchers developed a multistage AI approach for Bengali speech transcription and speaker diarization, achieving significant improvements in processing long-form audio recordings. The system used fine-tuned Whisper models and custom segmentation techniques to address the low-resource nature of Bengali in speech technology applications.

AIBullishHugging Face Blog · Dec 204/104

🧠

Speculative Decoding for 2x Faster Whisper Inference

The article title suggests a technical advancement in Whisper inference using speculative decoding to achieve 2x faster processing speeds. However, no article body content was provided to analyze the specific implementation or implications.

AINeutralHugging Face Blog · Nov 34/106

🧠

Fine-Tune Whisper For Multilingual ASR with 🤗 Transformers

The article appears to discuss fine-tuning Whisper, OpenAI's automatic speech recognition model, for multilingual applications using Hugging Face Transformers library. However, the article body is empty, making detailed analysis impossible.

CryptoNeutralEthereum Foundation Blog · Dec 314/101

⛓️

December Roundup

December saw continued development progress across the Ethereum ecosystem, with ongoing research into proof of stake and sharding following the Singapore workshop. The light client, Whisper, and Swarm protocols advanced while discussions on protocol economics and community governance continued.

$ETH

AINeutralHugging Face Blog · May 131/107

🧠

Blazingly fast whisper transcriptions with Inference Endpoints

The article appears to discuss fast whisper transcription services using Inference Endpoints, but the article body is empty or not provided. Without content, it's impossible to analyze the specific details, implications, or significance of the transcription technology mentioned in the title.