AIBearisharXiv โ CS AI ยท 14h ago7/10
๐ง
Sanity Checks for Agentic Data Science
Researchers propose lightweight sanity checks for agentic data science (ADS) systems to detect falsely optimistic conclusions that users struggle to identify. Using the Predictability-Computability-Stability framework, the checks expose whether AI agents like OpenAI Codex reliably distinguish signal from noise. Testing on 11 real datasets reveals that over half produced unsupported affirmative conclusions despite individual runs suggesting otherwise.
๐ข OpenAI