AINeutralarXiv โ CS AI ยท 6h ago2
๐ง
Measuring What AI Systems Might Do: Towards A Measurement Science in AI
Researchers argue that current AI evaluation methods fail to properly measure true AI capabilities and propensities, which should be treated as dispositional properties. The paper proposes a more scientific framework for AI evaluation that requires mapping causal relationships between contextual conditions and behavioral outputs, moving beyond simple benchmark averages.