AINeutralarXiv โ CS AI ยท 10h ago6/10
๐ง
NetArena: Dynamic Benchmarks for AI Agents in Network Automation
NetArena introduces a dynamic benchmarking framework for evaluating AI agents in network automation tasks, addressing limitations of static benchmarks through runtime query generation and network emulator integration. The framework reveals that AI agents achieve only 13-38% performance on realistic network queries, significantly improving statistical reliability by reducing confidence-interval overlap from 85% to 0%.