AIBullisharXiv โ CS AI ยท 4h ago5
๐ง
Audio-Conditioned Diffusion LLMs for ASR and Deliberation Processing
Researchers developed Whisper-LLaDA, a diffusion-based large language model for automatic speech recognition that achieves 12.3% relative improvement over baseline models. The study demonstrates that audio-conditioned embeddings are crucial for accuracy improvements, while plain-text processing without acoustic features fails to enhance performance.