12 articles tagged with #whisper. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullishOpenAI News Β· Apr 247/105
π§ OpenAI has made GPT-4 API generally available alongside GPT-3.5 Turbo, DALLΒ·E, and Whisper APIs. The company announced a deprecation plan for older Completions API models, which will be retired at the beginning of 2024.
AIBullishOpenAI News Β· Apr 247/106
π§ OpenAI has released APIs for ChatGPT and Whisper models, allowing developers to integrate these AI capabilities directly into their applications and products. This marks a significant step in making advanced conversational AI and speech recognition technology accessible to third-party developers.
AIBullishOpenAI News Β· Sep 217/107
π§ OpenAI has trained and open-sourced Whisper, a neural network for speech recognition that achieves human-level robustness and accuracy on English speech. The model represents a significant advancement in AI speech recognition technology and is being made freely available to the community.
AIBullisharXiv β CS AI Β· Mar 37/107
π§ Researchers introduce Whisper-MLA, a modified version of OpenAI's Whisper speech recognition model that uses Multi-Head Latent Attention to reduce GPU memory consumption by up to 87.5% while maintaining accuracy. The innovation addresses a key scalability issue with transformer-based ASR models when processing long-form audio.
AIBullisharXiv β CS AI Β· Mar 26/1015
π§ Researchers developed Whisper-LLaDA, a diffusion-based large language model for automatic speech recognition that achieves 12.3% relative improvement over baseline models. The study demonstrates that audio-conditioned embeddings are crucial for accuracy improvements, while plain-text processing without acoustic features fails to enhance performance.
AINeutralarXiv β CS AI Β· Mar 264/10
π§ Researchers developed a new training framework to address contextual exposure bias in Speech-LLMs, where models trained on perfect conversation history perform poorly with error-prone real-world context. Their approach combines teacher error knowledge, context dropout, and direct preference optimization to improve robustness, achieving WER reductions from 5.59% to 5.17% on TED-LIUM 3.
AINeutralarXiv β CS AI Β· Mar 44/103
π§ Researchers introduce Whisper-RIR-Mega, a new benchmark dataset for testing automatic speech recognition robustness in reverberant acoustic environments. The study evaluates five Whisper models and finds that reverberation consistently degrades performance across all model sizes, with word error rates increasing by 0.12 to 1.07 percentage points.
AIBullisharXiv β CS AI Β· Mar 44/102
π§ Researchers developed a multistage AI approach for Bengali speech transcription and speaker diarization, achieving significant improvements in processing long-form audio recordings. The system used fine-tuned Whisper models and custom segmentation techniques to address the low-resource nature of Bengali in speech technology applications.
AIBullishHugging Face Blog Β· Dec 204/104
π§ The article title suggests a technical advancement in Whisper inference using speculative decoding to achieve 2x faster processing speeds. However, no article body content was provided to analyze the specific implementation or implications.
AINeutralHugging Face Blog Β· Nov 34/106
π§ The article appears to discuss fine-tuning Whisper, OpenAI's automatic speech recognition model, for multilingual applications using Hugging Face Transformers library. However, the article body is empty, making detailed analysis impossible.
CryptoNeutralEthereum Foundation Blog Β· Dec 314/101
βοΈDecember saw continued development progress across the Ethereum ecosystem, with ongoing research into proof of stake and sharding following the Singapore workshop. The light client, Whisper, and Swarm protocols advanced while discussions on protocol economics and community governance continued.
$ETH
AINeutralHugging Face Blog Β· May 131/107
π§ The article appears to discuss fast whisper transcription services using Inference Endpoints, but the article body is empty or not provided. Without content, it's impossible to analyze the specific details, implications, or significance of the transcription technology mentioned in the title.