AINeutralarXiv โ CS AI ยท 4h ago7/10
๐ง
Benchmarking Deflection and Hallucination in Large Vision-Language Models
Researchers introduce VLM-DeflectionBench, a new benchmark with 2,775 samples designed to evaluate how large vision-language models handle conflicting or insufficient evidence. The study reveals that most state-of-the-art LVLMs fail to appropriately deflect when faced with noisy or misleading information, highlighting critical gaps in model reliability for knowledge-intensive tasks.