AINeutralarXiv – CS AI · 7h ago6/10
🧠
Search Discipline for Long-Horizon Research Agents
Researchers identify a critical flaw in autonomous research agents that optimize candidate selection using aggregate metrics: when validity is multidimensional but verification uses single-metric reduction, agents rank wrong candidates first. The study proposes an external audit protocol that evaluates disaggregated behavior to catch invalid candidates that score well on headline metrics.