y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#penetration-testing News & Analysis

4 articles tagged with #penetration-testing. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

4 articles
AIBearisharXiv – CS AI · 3d ago7/10
🧠

How Reliable Are AI Attackers Against a Fixed Vulnerable Target? A 400-Run Empirical Study of LLM Penetration Testing Consistency

Researchers conducted 400 autonomous penetration testing runs across four LLM models against a fixed vulnerable target to measure attack consistency. Results show significant variation in exploitation success rates (25-85%) and distinctive failure modes per model, with Claude and Gemini 2.5 Flash-Lite substantially outperforming GPT-4o-mini and Qwen, raising critical questions about LLM reliability in security-critical autonomous operations.

🏢 Anthropic🧠 GPT-4🧠 Claude
AIBearisharXiv – CS AI · 5d ago7/10
🧠

Lessons from Penetration Tests on Large-Scale Agent Systems

A new research paper presents findings from penetration tests conducted in 2025 against proprietary AI agent systems, examining whether security vulnerabilities in autonomous agents have improved compared to open-source alternatives. The study reveals that execution-capable AI agents face recurring security weaknesses similar to those in traditional software systems, challenging assumptions that proprietary development with stricter standards provides meaningfully better security outcomes.

AIBullisharXiv – CS AI · Mar 47/102
🧠

Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing

Researchers conducted the first comprehensive evaluation comparing AI agents to human cybersecurity professionals in live penetration testing on a university network with 8,000 hosts. The new ARTEMIS AI agent framework placed second overall, discovering 9 vulnerabilities with 82% accuracy and outperforming 9 of 10 human participants while costing significantly less at $18/hour versus $60/hour for human testers.

AIBullisharXiv – CS AI · Mar 36/109
🧠

AWE: Adaptive Agents for Dynamic Web Penetration Testing

Researchers introduced AWE, a memory-augmented multi-agent framework for autonomous web penetration testing that outperforms existing tools on injection vulnerabilities. AWE achieved 87% XSS success and 66.7% blind SQL injection success on benchmark tests, demonstrating superior accuracy and efficiency compared to general-purpose AI penetration testing tools.