AIBullisharXiv โ CS AI ยท 3d ago7/10
๐ง
MASEval: Extending Multi-Agent Evaluation from Models to Systems
MASEval introduces a new framework-agnostic evaluation library for multi-agent AI systems that treats entire systems rather than just models as the unit of analysis. Research across 3 benchmarks, models, and frameworks reveals that framework choice impacts performance as much as model selection, challenging current model-centric evaluation approaches.