AINeutralarXiv – CS AI · 10h ago6/10
🧠
MaD Physics: Evaluating information seeking under constraints in physical environments
Researchers introduce MaD Physics, a benchmark for evaluating AI agents' ability to conduct scientific discovery under realistic resource constraints. The benchmark tests agents' capacity to make informative measurements within budget limits and infer underlying physical laws, using altered physics environments to prevent reliance on training data.
🧠 Gemini