y0news
← Feed
Back to feed
🧠 AI🟢 Bullish

When Scaling Fails: Mitigating Audio Perception Decay of LALMs via Multi-Step Perception-Aware Reasoning

arXiv – CS AI|Ruixiang Mao, Xiangnan Ma, Dan Chen, Ziming Zhu, Yuan Ge, Aokai Hao, Haishu Zhao, Yifu Huo, Qing Yang, Kaiyan Chang, Xiaoqian Liu, Chenglong Wang, Qiaozhi He, Tong Xiao, Jingbo Zhu||1 views
🤖AI Summary

Researchers identified a critical problem in Large Audio-Language Models (LALMs) where audio perception deteriorates during extended reasoning processes. They developed MPAR² framework using reinforcement learning, which improved perception performance from 31.74% to 63.51% and achieved 74.59% accuracy on MMAU benchmark.

Key Takeaways
  • Large Audio-Language Models experience audio perception decay as reasoning chains become longer, creating a fundamental bottleneck.
  • Traditional test-time scaling approaches show marginal or negative gains in LALMs compared to direct answering methods.
  • CAFE evaluation framework was introduced to precisely quantify and measure audio reasoning errors in language models.
  • MPAR² paradigm uses reinforcement learning to decompose complex questions into perception-rich sub-problems.
  • The new approach nearly doubled perception performance and helps models dynamically adapt reasoning budget to task complexity.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles