y0news
AnalyticsDigestsSourcesRSSAICrypto
#trajeval1 article
1 articles
AIBullisharXiv โ€“ CS AI ยท 8h ago6/10
๐Ÿง 

TRAJEVAL: Decomposing Code Agent Trajectories for Fine-Grained Diagnosis

Researchers introduce TRAJEVAL, a diagnostic framework that breaks down AI code agent performance into three stages (search, read, edit) to identify specific failure points rather than just binary pass/fail outcomes. The framework analyzed 16,758 trajectories and found that real-time feedback based on trajectory signals improved state-of-the-art models by 2.2-4.6 percentage points while reducing costs by 20-31%.

๐Ÿง  GPT-5