AIBullisharXiv โ CS AI ยท 8h ago6/10
๐ง
TRAJEVAL: Decomposing Code Agent Trajectories for Fine-Grained Diagnosis
Researchers introduce TRAJEVAL, a diagnostic framework that breaks down AI code agent performance into three stages (search, read, edit) to identify specific failure points rather than just binary pass/fail outcomes. The framework analyzed 16,758 trajectories and found that real-time feedback based on trajectory signals improved state-of-the-art models by 2.2-4.6 percentage points while reducing costs by 20-31%.
๐ง GPT-5