y0news
#multimodal-llm3 articles
3 articles
AIBullisharXiv โ€“ CS AI ยท 4h ago7
๐Ÿง 

Reasoning-Driven Multimodal LLM for Domain Generalization

Researchers developed RD-MLDG, a new framework that uses multimodal large language models with reasoning chains to improve domain generalization in deep learning. The approach addresses challenges in cross-domain visual recognition by leveraging reasoning capabilities rather than just visual feature invariance, achieving state-of-the-art performance on standard benchmarks.

AINeutralarXiv โ€“ CS AI ยท 4h ago5
๐Ÿง 

Ref-Adv: Exploring MLLM Visual Reasoning in Referring Expression Tasks

Researchers introduce Ref-Adv, a new benchmark for testing multimodal large language models' visual reasoning capabilities in referring expression tasks. The benchmark reveals that current MLLMs, despite performing well on standard datasets like RefCOCO, rely heavily on shortcuts and show significant gaps in genuine visual reasoning and grounding abilities.

AIBullisharXiv โ€“ CS AI ยท 4h ago5
๐Ÿง 

From Generator to Embedder: Harnessing Innate Abilities of Multimodal LLMs via Building Zero-Shot Discriminative Embedding Model

Researchers propose a data-efficient framework to convert generative Multimodal Large Language Models into universal embedding models without extensive pre-training. The method uses hierarchical embedding prompts and Self-aware Hard Negative Sampling to achieve competitive performance on embedding benchmarks using minimal training data.