#large-audio-language-models News & Analysis

3 articles tagged with #large-audio-language-models. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

3 articles

AIBullisharXiv – CS AI · Jun 47/10

🧠

Audio Interaction Model

Researchers introduce Audio-Interaction, a unified streaming model that enables Large Audio Language Models to process audio in real time through a perceive-decide-respond loop, handling tasks from speech recognition to voice chatting. The framework, SoundFlow, includes a new 2.6M-item streaming corpus and demonstrates competitive performance on mainstream audio tasks while unlocking real-time interactive capabilities previously unavailable to offline models.

AIBullisharXiv – CS AI · May 277/10

🧠

Learning When to Think While Listening in Large Audio-Language Models

Researchers introduce a learnable control system for Large Audio-Language Models that dynamically decides when to process reasoning during real-time speech interactions. The approach balances responsiveness with accuracy by optimizing intermediate reasoning transparency, achieving 2.7% accuracy improvement while reducing latency on benchmark tasks.

AINeutralarXiv – CS AI · Jun 236/10

🧠

AOR-Bench: Do Large Audio Language Models Over-Refuse Pseudo-Harmful Queries?

Researchers introduce AOR-Bench, the first benchmark measuring over-refusal in Large Audio Language Models (LALMs), where safety mechanisms incorrectly reject benign queries. Testing 12 models across six families reveals widespread over-refusal, particularly when audio context could disambiguate potentially harmful speech, prompting exploration of mitigation strategies like Chain-of-Thought reasoning.