AINeutralarXiv β CS AI Β· 7h ago6/10
π§
COHERENCE: Benchmarking Fine-Grained Image-Text Alignment in Interleaved Multimodal Contexts
Researchers introduced COHERENCE, a new benchmark for evaluating Multimodal Large Language Models (MLLMs) on their ability to understand fine-grained image-text alignment in interleaved contextsβsuch as documents with mixed text and images. The benchmark contains 6,161 high-quality questions across four domains and includes error analysis to identify specific capability gaps in current models.