y0news
← Feed
Back to feed
🧠 AI NeutralImportance 6/10

Revealing Multi-View Hallucination in Large Vision-Language Models

arXiv – CS AI|Wooje Park, Insu Lee, Soohyun Kim, Jaeyun Jang, Minyoung Noh, Kyuhong Shim, Byonghyo Shim|
🤖AI Summary

Researchers identify 'multi-view hallucination' as a major problem in large vision-language models (LVLMs), where these AI systems confuse visual information from different viewpoints or instances. They created MVH-Bench benchmark and developed Reference Shift Contrastive Decoding (RSCD) technique, which improved performance by up to 34.6 points without requiring model retraining.

Key Takeaways
  • Large vision-language models suffer from multi-view hallucination, confusing visual information from different viewpoints or instances.
  • MVH-Bench benchmark with 4.8k question-answer pairs was created to systematically test cross-instance and cross-view hallucination.
  • Current LVLMs including Qwen2.5-VL and LLaVA-OneVision struggle to correctly associate visual evidence with corresponding instances.
  • Reference Shift Contrastive Decoding (RSCD) is a training-free technique that suppresses visual interference through attention masking.
  • RSCD improved performance by up to 34.6 points over existing hallucination mitigation methods without requiring model retraining.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles