βBack to feed
π§ AIβͺ NeutralImportance 6/10
Revealing Multi-View Hallucination in Large Vision-Language Models
arXiv β CS AI|Wooje Park, Insu Lee, Soohyun Kim, Jaeyun Jang, Minyoung Noh, Kyuhong Shim, Byonghyo Shim|
π€AI Summary
Researchers identify 'multi-view hallucination' as a major problem in large vision-language models (LVLMs), where these AI systems confuse visual information from different viewpoints or instances. They created MVH-Bench benchmark and developed Reference Shift Contrastive Decoding (RSCD) technique, which improved performance by up to 34.6 points without requiring model retraining.
Key Takeaways
- βLarge vision-language models suffer from multi-view hallucination, confusing visual information from different viewpoints or instances.
- βMVH-Bench benchmark with 4.8k question-answer pairs was created to systematically test cross-instance and cross-view hallucination.
- βCurrent LVLMs including Qwen2.5-VL and LLaVA-OneVision struggle to correctly associate visual evidence with corresponding instances.
- βReference Shift Contrastive Decoding (RSCD) is a training-free technique that suppresses visual interference through attention masking.
- βRSCD improved performance by up to 34.6 points over existing hallucination mitigation methods without requiring model retraining.
#vision-language-models#hallucination#multi-view#benchmark#decoding-technique#computer-vision#ai-research#model-performance
Read Original βvia arXiv β CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains β you keep full control of your keys.
Related Articles