#performance-benchmarking News & Analysis

3 articles tagged with #performance-benchmarking. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

3 articles

AIBullisharXiv – CS AI · Mar 117/10

🧠

MASEval: Extending Multi-Agent Evaluation from Models to Systems

MASEval introduces a new framework-agnostic evaluation library for multi-agent AI systems that treats entire systems rather than just models as the unit of analysis. Research across 3 benchmarks, models, and frameworks reveals that framework choice impacts performance as much as model selection, challenging current model-centric evaluation approaches.

AIBullisharXiv – CS AI · Jun 236/10

🧠

Agent-as-a-Router: Agentic Model Routing for Coding Tasks

Researchers propose Agent-as-a-Router, a framework that dynamically routes coding tasks to the most suitable LLM among multiple providers by accumulating execution-grounded experience during deployment. The approach, instantiated as ACRouter, demonstrates 15.3% performance gains over static routers and introduces CodeRouterBench, a benchmark with ~10K tasks from 8 frontier LLMs, addressing the critical need for intelligent model selection in multi-provider environments.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Characterizing Performance-Energy Trade-offs of Large Language Models in Multi-Request Workflows

Researchers present the first systematic study of performance-energy trade-offs in multi-request LLM inference workflows, using NVIDIA A100 GPUs and vLLM/Parrot serving systems. The study identifies batch size as the most impactful optimization lever, though effectiveness varies by workload type, and reveals that workflow-aware scheduling can reduce energy consumption under power constraints.

🏢 Nvidia