AIBullisharXiv – CS AI · 9h ago7/10
🧠
Audio Interaction Model
Researchers introduce Audio-Interaction, a unified streaming model that enables Large Audio Language Models to process audio in real time through a perceive-decide-respond loop, handling tasks from speech recognition to voice chatting. The framework, SoundFlow, includes a new 2.6M-item streaming corpus and demonstrates competitive performance on mainstream audio tasks while unlocking real-time interactive capabilities previously unavailable to offline models.