#mllms News & Analysis

2 articles tagged with #mllms. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles

AIBearisharXiv – CS AI · Jun 107/10

🧠

Beyond APIs: Probing the Limits of MLLMs in Physical Tool Use

Researchers introduced PhysTool-Bench, a benchmark testing how well multimodal large language models (MLLMs) can recognize and use physical tools in real-world scenarios. Testing 13 leading models revealed significant limitations: even the best performer (Gemini-3.1-Pro) identified only 58.7% of tools in scenes and completed just 21% of end-to-end tasks, exposing critical gaps in perception and functional reasoning for embodied AI applications.

🧠 Gemini

AINeutralarXiv – CS AI · Jun 86/10

🧠

Watch, Remember, Reason: Human-View Video Understanding with MLLMs

A comprehensive review paper presents a unified framework for analyzing video understanding systems powered by multimodal large language models (MLLMs), organizing capabilities into three functional abilities: watching (perception), remembering (memory), and reasoning (inference). The work identifies key challenges in processing long, sparse, and knowledge-intensive video content while operating under computational constraints.