y0news
AnalyticsDigestsRSSAICrypto
#audio-visual1 article
1 articles
AIBullisharXiv โ€“ CS AI ยท 5h ago
๐Ÿง 

Crab$^{+}$: A Scalable and Unified Audio-Visual Scene Understanding Model with Explicit Cooperation

Researchers developed Crab+, a new Audio-Visual Large Language Model that addresses the problem of negative transfer in multi-task learning, where 55% of tasks typically degrade when trained together. The model introduces explicit cooperation mechanisms and achieves positive transfer in 88% of tasks, outperforming both unified and specialized models.