AINeutralarXiv – CS AI · 9h ago6/10
🧠
ADK Arena: Evaluating Agent Development Kits via LLM-as-a-Developer
Researchers introduce ADK Arena, an automated evaluation framework that uses LLMs as proxy developers to benchmark 51 Python Agent Development Kits across multiple benchmarks. The study reveals significant performance variation across frameworks, with generation costs varying 5.6x and no single dominant framework, while documentation and source code prove largely substitutable in agent development.