AIBullisharXiv โ CS AI ยท 5h ago
๐ง
Crab$^{+}$: A Scalable and Unified Audio-Visual Scene Understanding Model with Explicit Cooperation
Researchers developed Crab+, a new Audio-Visual Large Language Model that addresses the problem of negative transfer in multi-task learning, where 55% of tasks typically degrade when trained together. The model introduces explicit cooperation mechanisms and achieves positive transfer in 88% of tasks, outperforming both unified and specialized models.