y0news
#evaluation-framework2 articles
2 articles
AINeutralarXiv โ€“ CS AI ยท 6h ago3
๐Ÿง 

Property-Driven Evaluation of GNN Expressiveness at Scale: Datasets, Framework, and Study

Researchers developed a comprehensive evaluation framework for Graph Neural Networks (GNNs) using formal specification methods, creating 336 new datasets to test GNN expressiveness across 16 fundamental graph properties. The study reveals that no single pooling approach consistently performs well across all properties, with attention-based pooling excelling in generalization while second-order pooling provides better sensitivity.

AIBullisharXiv โ€“ CS AI ยท 6h ago1
๐Ÿง 

MOSAIC: A Unified Platform for Cross-Paradigm Comparison and Evaluation of Homogeneous and Heterogeneous Multi-Agent RL, LLM, VLM, and Human Decision-Makers

MOSAIC is a new open-source platform that enables cross-paradigm comparison and evaluation of different AI agents including reinforcement learning, large language models, vision-language models, and human decision-makers within the same environment. The platform introduces three key technical contributions: an IPC-based worker protocol, operator abstraction for unified interfaces, and a deterministic evaluation framework for reproducible research.