y0news
← Feed
Back to feed
🧠 AI🟢 BullishImportance 6/10

TRAJEVAL: Decomposing Code Agent Trajectories for Fine-Grained Diagnosis

arXiv – CS AI|Myeongsoo Kim, Dingmin Wang, Siwei Cui, Farima Farmahinifarahani, Shweta Garg, Baishakhi Ray, Terry Yue Zhuo, Rajdeep Mukherjee, Varun Kumar|
🤖AI Summary

Researchers introduce TRAJEVAL, a diagnostic framework that breaks down AI code agent performance into three stages (search, read, edit) to identify specific failure points rather than just binary pass/fail outcomes. The framework analyzed 16,758 trajectories and found that real-time feedback based on trajectory signals improved state-of-the-art models by 2.2-4.6 percentage points while reducing costs by 20-31%.

Key Takeaways
  • TRAJEVAL decomposes AI agent trajectories into search, read, and edit stages for fine-grained performance diagnosis.
  • All analyzed agents examine approximately 22x more functions than necessary, indicating universal inefficiencies.
  • Different AI models show distinct failure patterns: GPT-5 locates code well but targets edits poorly, while Qwen-32B fails at file discovery.
  • Real-time trajectory feedback improved model performance by 2.2-4.6 percentage points while cutting costs by 20-31%.
  • The framework enables predictive analysis, achieving model-level Pass@1 prediction within 0.87-2.1% mean absolute error.
Mentioned in AI
Models
GPT-5OpenAI
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles