AINeutralarXiv – CS AI · 10h ago6/10
🧠
Evaluating LLMs for Real-World Web Vulnerability Detection
Researchers benchmarked six large language models on their ability to detect real-world web vulnerabilities in WordPress plugins, finding that while all models can identify security issues, detection rates vary significantly (35-63%) and no model maintains consistent results across repeated tests. The findings reveal both the promise and critical limitations of LLM-based vulnerability detection for security practitioners.
🧠 GPT-5🧠 Claude🧠 Opus