AINeutralarXiv – CS AI · 6h ago6/10
🧠
Towards Understanding Modality Interaction in Multimodal Language Models via Partial Information Decomposition
Researchers introduce Partial Information Decomposition (PID), a framework for analyzing how multimodal language models integrate vision and language inputs by separating unique, redundant, and synergistic contributions. The analysis reveals distinct modality-use patterns across task types and identifies visual dominance as a bottleneck in audio-visual fusion systems.