y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#large-multimodal-models News & Analysis

2 articles tagged with #large-multimodal-models. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles
AINeutralarXiv โ€“ CS AI ยท Mar 267/10
๐Ÿง 

Divide, then Ground: Adapting Frame Selection to Query Types for Long-Form Video Understanding

Researchers propose DIG, a training-free framework that improves long-form video understanding by adapting frame selection strategies based on query types. The system uses uniform sampling for global queries and specialized selection for localized queries, achieving better performance than existing methods while scaling to 256 input frames.

AINeutralarXiv โ€“ CS AI ยท Mar 96/10
๐Ÿง 

VisioMath: Benchmarking Figure-based Mathematical Reasoning in LMMs

Researchers introduced VisioMath, a new benchmark with 1,800 K-12 math problems designed to test Large Multimodal Models' ability to distinguish between visually similar diagrams. The study reveals that current state-of-the-art models struggle with fine-grained visual reasoning, often relying on shallow positional heuristics rather than proper image-text alignment.