Taming System Complexity: Demystifying Software Engineering Agents in Diagnosing Linux Kernel Faults
Researchers introduce LinuxFLBench, a fault localization benchmark for Linux kernel bugs, and demonstrate that current LLM agents struggle with this complex task, achieving only 41.6% accuracy. They propose LinuxFL+, an enhancement framework that improves accuracy by 7.2-11.2% across all tested agents, addressing a critical gap in software debugging automation.