AINeutralarXiv – CS AI · 14h ago6/10
🧠
LogDx-CI: Benchmarking Log Reduction Tools for LLM Root-Cause Diagnosis
Researchers introduce LogDx-CI, a benchmark comparing 11 log-reduction tools for debugging CI failures using LLMs, finding that hybrid grep+tail routers achieve the best cost-quality tradeoff while agent-loop systems can recover from weak contexts through iterative tool calls, though at higher computational cost.
🏢 OpenAI🧠 GPT-5🧠 Claude