y0news
AnalyticsDigestsSourcesRSSAICrypto
#performance-benchmarking1 article
1 articles
AIBullisharXiv โ€“ CS AI ยท 3d ago7/10
๐Ÿง 

MASEval: Extending Multi-Agent Evaluation from Models to Systems

MASEval introduces a new framework-agnostic evaluation library for multi-agent AI systems that treats entire systems rather than just models as the unit of analysis. Research across 3 benchmarks, models, and frameworks reveals that framework choice impacts performance as much as model selection, challenging current model-centric evaluation approaches.