y0news
#benchmarking3 articles
3 articles
AIBullisharXiv โ€“ CS AI ยท 4h ago12
๐Ÿง 

DeepEyesV2: Toward Agentic Multimodal Model

DeepEyesV2 is a new agentic multimodal AI model that combines text and image comprehension with external tool integration like code execution and web search. The research introduces a two-stage training pipeline and RealX-Bench evaluation framework, demonstrating improved real-world reasoning capabilities through adaptive tool invocation.

AINeutralarXiv โ€“ CS AI ยท 4h ago4
๐Ÿง 

RooflineBench: A Benchmarking Framework for On-Device LLMs via Roofline Analysis

Researchers introduce RooflineBench, a framework for measuring performance capabilities of Small Language Models on edge devices using operational intensity analysis. The study reveals that sequence length significantly impacts performance, model depth causes efficiency regression, and structural improvements like Multi-head Latent Attention can unlock better hardware utilization.