y0news
← Feed
Back to feed
🧠 AI🟢 Bullish

Crab$^{+}$: A Scalable and Unified Audio-Visual Scene Understanding Model with Explicit Cooperation

arXiv – CS AI|Dongnuan Cai, Henghui Du, Chang Zhou, Xi Chen, Dan Guo, Hongyuan Zhang, Xuelong Li, Di Hu|
🤖AI Summary

Researchers developed Crab+, a new Audio-Visual Large Language Model that addresses the problem of negative transfer in multi-task learning, where 55% of tasks typically degrade when trained together. The model introduces explicit cooperation mechanisms and achieves positive transfer in 88% of tasks, outperforming both unified and specialized models.

Key Takeaways
  • Conventional multi-task audio-visual models suffer from negative transfer with 55% of tasks degrading compared to single-task training.
  • Crab+ introduces AV-UIE v2 dataset with 222K samples across 17 datasets and 7 tasks for comprehensive audio-visual understanding.
  • The model uses Interaction-aware LoRA (I-LoRA) with dynamic routing to coordinate different audio-visual interaction patterns.
  • Crab+ achieves positive transfer in nearly 88% of tasks, reversing the typical negative transfer trend in multi-task learning.
  • The unified model outperforms both existing unified models and specialized single-task models across various benchmarks.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles