🧠 AI⚪ NeutralImportance 6/10

Revealing Multi-View Hallucination in Large Vision-Language Models

arXiv – CS AI|Wooje Park, Insu Lee, Soohyun Kim, Jaeyun Jang, Minyoung Noh, Kyuhong Shim, Byonghyo Shim|March 26, 2026 at 04:00 AM

🤖AI Summary

Researchers identify 'multi-view hallucination' as a major problem in large vision-language models (LVLMs), where these AI systems confuse visual information from different viewpoints or instances. They created MVH-Bench benchmark and developed Reference Shift Contrastive Decoding (RSCD) technique, which improved performance by up to 34.6 points without requiring model retraining.

Key Takeaways

→Large vision-language models suffer from multi-view hallucination, confusing visual information from different viewpoints or instances.
→MVH-Bench benchmark with 4.8k question-answer pairs was created to systematically test cross-instance and cross-view hallucination.
→Current LVLMs including Qwen2.5-VL and LLaVA-OneVision struggle to correctly associate visual evidence with corresponding instances.
→Reference Shift Contrastive Decoding (RSCD) is a training-free technique that suppresses visual interference through attention masking.
→RSCD improved performance by up to 34.6 points over existing hallucination mitigation methods without requiring model retraining.