#vulnerability-detection News & Analysis

47 articles tagged with #vulnerability-detection. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

47 articles

AIBearisharXiv – CS AI · Jun 237/10

🧠

Local LLM Agents as Vulnerable Runtimes:A Source-Code Audit of the Agent Runtime Layer

Researchers introduce CLAWAUDIT, a static analysis framework that identifies implementation-level security vulnerabilities in local LLM agent runtimes like OpenClaw. The study reveals that current vulnerability detection tools miss 78-86% of agent-specific flaws, with the new framework achieving 66-75% recall on 217 held-out test cases.

AIBearisharXiv – CS AI · Jun 197/10

🧠

Calibration Without Comprehension: Diagnosing the Limits of Fine-Tuning LLMs for Vulnerability Detection in Systems Software

A new research framework called CWE-Trace challenges the claim that large language models can reliably detect software vulnerabilities, revealing that fine-tuned models achieve only 52.1% accuracy at best and lack genuine security reasoning despite appearing well-calibrated. The study of 834 Linux kernel samples shows that models exhibit systematic failure patterns that persist across datasets and resist correction through fine-tuning, suggesting they memorize patterns rather than understand vulnerability detection.

AIBullisharXiv – CS AI · Jun 117/10

🧠

Runtime Skill Audit: Targeted Runtime Probing for Agent Skill Security

Researchers introduced Runtime Skill Audit (RSA), a dynamic analysis method that detects malicious behavior in LLM agent skills by testing them under targeted runtime conditions rather than relying on static code review. RSA achieved 90% accuracy in identifying harmful skills and maintained effectiveness against evolving attacks where static methods failed, addressing a critical security gap in agent-based AI systems.

AIBearisharXiv – CS AI · Jun 117/10

🧠

Are Frontier LLMs Ready for Cybersecurity? Evidence for Vertical Foundation Models from Dual-Mode Vulnerability Benchmarks

A comprehensive evaluation of frontier large language models for cybersecurity tasks reveals they struggle with high false positive rates (10-50%) in vulnerability detection and achieve only 4-8% accuracy in black-box testing, suggesting that specialized domain training and structured methodology matter more than model scale for security applications.

🧠 GPT-5🧠 Claude🧠 Gemini

AI × CryptoNeutralCrypto Briefing · Jun 107/10

🤖

AI identifies critical bug in Zcash that could have enabled unlimited counterfeit minting

An AI system successfully identified a critical vulnerability in Zcash's protocol that could have permitted unlimited counterfeit token creation, highlighting AI's emerging role in blockchain security auditing. The discovery underscores the importance of advanced detection mechanisms in protecting privacy-focused cryptocurrencies from catastrophic flaws.

AI × CryptoBullisharXiv – CS AI · Jun 57/10

🤖

AttackPathGNN: Cross-function vulnerability detection in smart contracts using state interference graphs and conjunction pooling

Researchers introduce AttackPathGNN, a graph neural network that detects smart contract vulnerabilities by analyzing relationships between functions rather than isolated code patterns. The method achieves 92.3% F1 score on test datasets and identifies exploits like reentrancy that existing detectors miss, addressing security gaps exposed by historical attacks like The DAO.

AIBearisharXiv – CS AI · Jun 47/10

🧠

Description-Code Inconsistency in Real-world MCP Servers: Measurement, Detection, and Security Implications

Researchers have identified widespread Description-Code Inconsistency (DCI) in Model Context Protocol servers, where tool descriptions don't match actual implementations. A study of 2,214 MCP servers found that 9.93% of description-code pairs exhibit inconsistencies, creating security vulnerabilities that enable operational failures and malicious behavior in LLM-powered applications.

AIBullisharXiv – CS AI · Jun 47/10

🧠

CyberGym-E2E: Scalable Real-World Benchmark for AI Agents' End-to-End Cybersecurity Capabilities

Researchers introduce CyberGym-E2E, a large-scale benchmark with 920 real-world vulnerabilities that evaluates AI agents across the complete vulnerability lifecycle—discovery, proof-of-concept generation, and patch creation. This addresses a critical gap in cybersecurity AI evaluation by testing end-to-end remediation capabilities rather than isolated tasks, establishing a new standard for measuring autonomous vulnerability management systems.

AIBullishBlockonomi · Jun 27/10

🧠

Anthropic Expands Project Glasswing to 150 New Cybersecurity Partners

Anthropic has expanded its Project Glasswing cybersecurity initiative to approximately 150 organizations across 15+ countries, including partners from critical infrastructure sectors such as power, water, healthcare, and communications. Early participants using Claude Mythos Preview have already identified over 10,000 high-severity and critical software vulnerabilities, demonstrating the practical value of AI-assisted vulnerability detection.

🏢 Anthropic🧠 Claude

AIBearisharXiv – CS AI · Jun 27/10

🧠

When Safe Skills Collide: Measuring Compositional Risk in Agent Skill Ecosystems

Researchers present SkillReact, a framework measuring compositional safety risks in LLM agent skill ecosystems, finding that 18.2% of individually-safe skill pairs create genuine safety vulnerabilities when combined—risks missed by per-skill scanning alone. Testing on 211,575 skill pairs from ClawHub reveals model-dependent execution risk, with smaller models like Haiku more likely to execute unsafe tool chains than larger models like Sonnet.

AIBullisharXiv – CS AI · May 287/10

🧠

VULPO: Context-Aware Vulnerability Detection via On-Policy LLM Optimization

Researchers introduce VULPO, an on-policy LLM optimization framework for vulnerability detection that achieves 203% improvement over baseline models by incorporating context-aware reasoning and multidimensional reward signals. The approach combines a new ContextVul dataset with specialized fine-tuning to create more effective security analysis tools that reason through complex code interactions.

AI × CryptoNeutralCrypto Briefing · May 277/10

🤖

A16z crypto study shows AI agents can detect DeFi exploits, but executing them is another story

A16z's research demonstrates that AI agents can successfully identify vulnerabilities in DeFi protocols, but face significant practical and technical barriers when attempting to exploit them. The findings underscore the dual-edged nature of AI in blockchain security and highlight the critical importance of developing containment measures to mitigate potential misuse by malicious actors.

AI × CryptoBullisharXiv – CS AI · May 127/10

🤖

CHAINTRIX: A multi-pipeline LLM-augmented framework for automated smart-contract security auditing

Researchers introduce Chaintrix, an LLM-augmented smart-contract auditing framework that combines AI analysis with deterministic structural verification to reduce false positives. The system achieves 71.7% recall on high-severity vulnerabilities, outperforming existing AI and static analysis tools by 26 percentage points on OpenAI's EVMbench benchmark.

🏢 OpenAI

AIBearisharXiv – CS AI · May 117/10

🧠

Direction for Detection: A Survey of Automated Vulnerability Detection and all of its Pain Points

A comprehensive survey of 87 machine learning vulnerability detection studies reveals that the field has stalled despite a decade of research, trapped in self-reinforcing feedback loops that optimize for narrow, artificial problems. Researchers identify twelve interconnected pain points spanning datasets, formulations, metrics, and evaluation approaches that perpetuate focus on binary C/C++ function-level classification while neglecting vulnerability type prediction, multilingual support, and broader detection granularities.

AIBearisharXiv – CS AI · May 77/10

🧠

Syntax- and Compilation-Preserving Evasion of LLM Vulnerability Detectors

Researchers demonstrate that LLM-based vulnerability detectors, increasingly used in software security pipelines, can be evaded through syntax-preserving code transformations. The study reveals that models with 70%+ accuracy on clean code can fail to detect 87%+ of vulnerabilities when subjected to minor edits, with adversarial attacks achieving up to 92.5% evasion rates—raising serious questions about the reliability of AI-driven security tools in production environments.

🧠 GPT-4

AIBullishCrypto Briefing · May 27/10

🧠

NSA tests Anthropic’s Mythos AI for Microsoft cybersecurity flaws

The NSA is testing Anthropic's Mythos AI model to identify cybersecurity vulnerabilities in Microsoft systems, signaling accelerating government adoption of advanced AI for national defense. This development underscores how AI is becoming central to cybersecurity strategy and may influence both defense priorities and the commercial AI landscape.

🏢 Anthropic

AINeutralCrypto Briefing · Apr 117/10

🧠

Brad Gerstner: Detachment from desires fosters personal achievement, Anthropic’s Mythos reveals critical vulnerabilities, and proactive AI measures are essential for cybersecurity | All-In Podcast

Brad Gerstner discussed Anthropic's AI model discoveries on the All-In Podcast, highlighting how advanced AI systems are exposing critical software vulnerabilities before they become widely exploited. The findings underscore the urgent need for companies to implement proactive cybersecurity measures as AI capabilities accelerate toward mainstream adoption.

🏢 Anthropic

AIBullisharXiv – CS AI · Mar 267/10

🧠

OSS-CRS: Liberating AIxCC Cyber Reasoning Systems for Real-World Open-Source Security

Researchers have created OSS-CRS, an open framework that makes DARPA's AI Cyber Challenge systems usable for real-world cybersecurity applications. The system successfully ported the winning Atlantis CRS and discovered 10 previously unknown bugs, including three high-severity issues, across 8 open-source projects.

AI × CryptoBullisharXiv – CS AI · Mar 177/10

🤖

Benchmarking Zero-Shot Reasoning Approaches for Error Detection in Solidity Smart Contracts

Researchers benchmarked state-of-the-art LLMs for detecting vulnerabilities in Solidity smart contracts using zero-shot prompting strategies. The study found that Chain-of-Thought and Tree-of-Thought approaches significantly improved recall (95-99%) but reduced precision, while Claude 3 Opus achieved the best performance with a 90.8 F1-score in vulnerability classification.

🧠 Claude

AIBearisharXiv – CS AI · Mar 47/103

🧠

ZeroDayBench: Evaluating LLM Agents on Unseen Zero-Day Vulnerabilities for Cyberdefense

Researchers introduced ZeroDayBench, a new benchmark testing LLM agents' ability to find and patch 22 critical vulnerabilities in open-source code. Testing on frontier models GPT-5.2, Claude Sonnet 4.5, and Grok 4.1 revealed that current LLMs cannot yet autonomously solve cybersecurity tasks, highlighting limitations in AI-powered code security.

AIBullisharXiv – CS AI · Feb 277/105

🧠

Automated Vulnerability Detection in Source Code Using Deep Representation Learning

Researchers developed a convolutional neural network model that can automatically detect vulnerabilities in C source code using deep learning techniques. The model was trained on datasets from Draper Labs and NIST, achieving higher recall than previous work while maintaining high precision and demonstrating effectiveness on real Linux kernel vulnerabilities.

AI × CryptoBullishThe Defiant · Feb 187/106

🤖

OpenAI Unveils AI Benchmark Tool to Enhance Blockchain Security

OpenAI has partnered with Paradigm to launch EVMbench, a new AI benchmark tool designed to evaluate artificial intelligence agents' capabilities in detecting, patching, and exploiting smart contract vulnerabilities. This tool represents a significant step forward in using AI to enhance blockchain security infrastructure.

AI × CryptoBullishBankless · Feb 187/105

🤖

OpenAI and Paradigm Introduce 'EVMbench' for AI Agent Benchmarking

OpenAI and Paradigm have launched EVMbench, a new benchmarking tool designed to evaluate AI agents' capabilities in detecting, exploiting, and patching high-severity smart contract vulnerabilities. This represents a significant step toward using AI for automated smart contract security auditing and vulnerability management.

AI × CryptoBullishOpenAI News · Feb 187/108

🤖

Introducing EVMbench

OpenAI and Paradigm have launched EVMbench, a new benchmark tool designed to evaluate AI agents' capabilities in detecting, patching, and exploiting high-severity vulnerabilities in smart contracts. This collaboration represents a significant step toward improving smart contract security through AI-powered analysis tools.

AIBullishOpenAI News · Oct 307/106

🧠

Introducing Aardvark: OpenAI’s agentic security researcher

OpenAI has launched Aardvark, an AI-powered autonomous security researcher that can find, validate, and help fix software vulnerabilities at scale. The system is currently in private beta with early testing available through sign-up.

Page 1 of 2Next →