What Makes LVLMs Hallucinate Less? Unveiling the Architectural Factors Behind Hallucination Robustness
Researchers identify that LVLM hallucination robustness depends primarily on architectural design choices rather than model scaling alone. The study introduces CoSimUE, a benchmark categorizing hallucinations into three types and reveals that visual encoding quality and semantic alignment strategies significantly outperform parameter scaling in reducing errors.