AINeutralarXiv – CS AI · 6h ago6/10
🧠
Decoupling Reconnaissance and Exploitation: Measuring the Capability Boundaries of LLM-Based Web Penetration Testing
Researchers propose a decoupled evaluation framework for testing LLM-based penetration testing agents by separating reconnaissance from exploitation tasks. The study reveals significant capability gaps: agents achieve 90% success with accurate vulnerability context but only 50% autonomous reconnaissance performance, with distinct strengths across different architectural designs.