#visual-representation News & Analysis

2 articles tagged with #visual-representation. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles

AINeutralarXiv – CS AI · Jun 17/10

🧠

What Makes LVLMs Hallucinate Less? Unveiling the Architectural Factors Behind Hallucination Robustness

Researchers identify that LVLM hallucination robustness depends primarily on architectural design choices rather than model scaling alone. The study introduces CoSimUE, a benchmark categorizing hallucinations into three types and reveals that visual encoding quality and semantic alignment strategies significantly outperform parameter scaling in reducing errors.

AIBullisharXiv – CS AI · Mar 176/10

🧠

Pixel-level Scene Understanding in One Token: Visual States Need What-is-Where Composition

Researchers propose CroBo, a new visual state representation learning framework that helps robotic agents better understand dynamic environments by encoding both semantic identities and spatial locations of scene elements. The framework uses a global-to-local reconstruction method that compresses observations into compact tokens, achieving state-of-the-art performance on robot policy learning benchmarks.