y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#auditing-framework News & Analysis

1 article tagged with #auditing-framework. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

1 articles
AINeutralarXiv – CS AI · 6h ago6/10
🧠

Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories

Researchers introduce TELBench, a benchmark for identifying errors in deep-research AI agent trajectories, and propose DRIFT, a claim-centric auditing framework that improves error localization accuracy by up to 30 percentage points. The work addresses a critical gap in AI evaluation by moving beyond final-answer assessment to analyze intermediate steps in agent reasoning.