#vulnerability-detection News & Analysis

26 articles tagged with #vulnerability-detection. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

26 articles

AI × CryptoNeutralCrypto Briefing · 3d ago7/10

🤖

A16z crypto study shows AI agents can detect DeFi exploits, but executing them is another story

A16z's research demonstrates that AI agents can successfully identify vulnerabilities in DeFi protocols, but face significant practical and technical barriers when attempting to exploit them. The findings underscore the dual-edged nature of AI in blockchain security and highlight the critical importance of developing containment measures to mitigate potential misuse by malicious actors.

AI × CryptoBullisharXiv – CS AI · May 127/10

🤖

CHAINTRIX: A multi-pipeline LLM-augmented framework for automated smart-contract security auditing

Researchers introduce Chaintrix, an LLM-augmented smart-contract auditing framework that combines AI analysis with deterministic structural verification to reduce false positives. The system achieves 71.7% recall on high-severity vulnerabilities, outperforming existing AI and static analysis tools by 26 percentage points on OpenAI's EVMbench benchmark.

🏢 OpenAI

AIBearisharXiv – CS AI · May 117/10

🧠

Direction for Detection: A Survey of Automated Vulnerability Detection and all of its Pain Points

A comprehensive survey of 87 machine learning vulnerability detection studies reveals that the field has stalled despite a decade of research, trapped in self-reinforcing feedback loops that optimize for narrow, artificial problems. Researchers identify twelve interconnected pain points spanning datasets, formulations, metrics, and evaluation approaches that perpetuate focus on binary C/C++ function-level classification while neglecting vulnerability type prediction, multilingual support, and broader detection granularities.

AIBearisharXiv – CS AI · May 77/10

🧠

Syntax- and Compilation-Preserving Evasion of LLM Vulnerability Detectors

Researchers demonstrate that LLM-based vulnerability detectors, increasingly used in software security pipelines, can be evaded through syntax-preserving code transformations. The study reveals that models with 70%+ accuracy on clean code can fail to detect 87%+ of vulnerabilities when subjected to minor edits, with adversarial attacks achieving up to 92.5% evasion rates—raising serious questions about the reliability of AI-driven security tools in production environments.

🧠 GPT-4

AIBullishCrypto Briefing · May 27/10

🧠

NSA tests Anthropic’s Mythos AI for Microsoft cybersecurity flaws

The NSA is testing Anthropic's Mythos AI model to identify cybersecurity vulnerabilities in Microsoft systems, signaling accelerating government adoption of advanced AI for national defense. This development underscores how AI is becoming central to cybersecurity strategy and may influence both defense priorities and the commercial AI landscape.

🏢 Anthropic

AINeutralCrypto Briefing · Apr 117/10

🧠

Brad Gerstner: Detachment from desires fosters personal achievement, Anthropic’s Mythos reveals critical vulnerabilities, and proactive AI measures are essential for cybersecurity | All-In Podcast

Brad Gerstner discussed Anthropic's AI model discoveries on the All-In Podcast, highlighting how advanced AI systems are exposing critical software vulnerabilities before they become widely exploited. The findings underscore the urgent need for companies to implement proactive cybersecurity measures as AI capabilities accelerate toward mainstream adoption.

🏢 Anthropic

AIBullisharXiv – CS AI · Mar 267/10

🧠

OSS-CRS: Liberating AIxCC Cyber Reasoning Systems for Real-World Open-Source Security

Researchers have created OSS-CRS, an open framework that makes DARPA's AI Cyber Challenge systems usable for real-world cybersecurity applications. The system successfully ported the winning Atlantis CRS and discovered 10 previously unknown bugs, including three high-severity issues, across 8 open-source projects.

AI × CryptoBullisharXiv – CS AI · Mar 177/10

🤖

Benchmarking Zero-Shot Reasoning Approaches for Error Detection in Solidity Smart Contracts

Researchers benchmarked state-of-the-art LLMs for detecting vulnerabilities in Solidity smart contracts using zero-shot prompting strategies. The study found that Chain-of-Thought and Tree-of-Thought approaches significantly improved recall (95-99%) but reduced precision, while Claude 3 Opus achieved the best performance with a 90.8 F1-score in vulnerability classification.

🧠 Claude

AIBearisharXiv – CS AI · Mar 47/103

🧠

ZeroDayBench: Evaluating LLM Agents on Unseen Zero-Day Vulnerabilities for Cyberdefense

Researchers introduced ZeroDayBench, a new benchmark testing LLM agents' ability to find and patch 22 critical vulnerabilities in open-source code. Testing on frontier models GPT-5.2, Claude Sonnet 4.5, and Grok 4.1 revealed that current LLMs cannot yet autonomously solve cybersecurity tasks, highlighting limitations in AI-powered code security.

AIBullisharXiv – CS AI · Feb 277/105

🧠

Automated Vulnerability Detection in Source Code Using Deep Representation Learning

Researchers developed a convolutional neural network model that can automatically detect vulnerabilities in C source code using deep learning techniques. The model was trained on datasets from Draper Labs and NIST, achieving higher recall than previous work while maintaining high precision and demonstrating effectiveness on real Linux kernel vulnerabilities.

AI × CryptoBullishThe Defiant · Feb 187/106

🤖

OpenAI Unveils AI Benchmark Tool to Enhance Blockchain Security

OpenAI has partnered with Paradigm to launch EVMbench, a new AI benchmark tool designed to evaluate artificial intelligence agents' capabilities in detecting, patching, and exploiting smart contract vulnerabilities. This tool represents a significant step forward in using AI to enhance blockchain security infrastructure.

AI × CryptoBullishBankless · Feb 187/105

🤖

OpenAI and Paradigm Introduce 'EVMbench' for AI Agent Benchmarking

OpenAI and Paradigm have launched EVMbench, a new benchmarking tool designed to evaluate AI agents' capabilities in detecting, exploiting, and patching high-severity smart contract vulnerabilities. This represents a significant step toward using AI for automated smart contract security auditing and vulnerability management.

AI × CryptoBullishOpenAI News · Feb 187/108

🤖

Introducing EVMbench

OpenAI and Paradigm have launched EVMbench, a new benchmark tool designed to evaluate AI agents' capabilities in detecting, patching, and exploiting high-severity vulnerabilities in smart contracts. This collaboration represents a significant step toward improving smart contract security through AI-powered analysis tools.

AIBullishOpenAI News · Oct 307/106

🧠

Introducing Aardvark: OpenAI’s agentic security researcher

OpenAI has launched Aardvark, an AI-powered autonomous security researcher that can find, validate, and help fix software vulnerabilities at scale. The system is currently in private beta with early testing available through sign-up.

AIBullisharXiv – CS AI · May 126/10

🧠

VulTriage: Triple-Path Context Augmentation for LLM-Based Vulnerability Detection

Researchers introduce VulTriage, an LLM-based framework that enhances vulnerability detection in source code through triple-path context augmentation combining control flow analysis, vulnerability knowledge retrieval, and semantic summarization. The approach achieves state-of-the-art results on benchmark datasets and demonstrates strong generalization to low-resource scenarios.

AINeutralThe Verge – AI · May 116/10

🧠

OpenAI just released its answer to Claude Mythos

OpenAI launched Daybreak, a security-focused AI initiative that proactively detects and patches software vulnerabilities using its Codex Security AI agent. The announcement directly follows Anthropic's release of Claude Mythos, positioning the two AI leaders in a competitive race to establish dominance in the emerging cybersecurity AI market.

🏢 OpenAI🏢 Anthropic🧠 Claude

AIBullishDecrypt – AI · May 116/10

🧠

OpenAI Launches Daybreak as AI Firms Expand Into Cybersecurity

OpenAI has launched Daybreak, an AI-powered initiative designed to help organizations identify software vulnerabilities and enhance cybersecurity defenses. This move reflects the broader trend of AI companies expanding into enterprise security solutions, positioning artificial intelligence as a critical tool for identifying and mitigating cyber threats.

🏢 OpenAI

AIBullishArs Technica – AI · May 76/10

🧠

Mozilla says 271 vulnerabilities found by Mythos have "almost no false positives"

Mozilla has validated AI-assisted bug discovery through its partnership with Mythos, which identified 271 vulnerabilities in Firefox with minimal false positives. The organization's endorsement signals growing confidence in AI tools for security vulnerability detection, representing a shift in how major software developers approach quality assurance.

AINeutralarXiv – CS AI · May 46/10

🧠

Semia: Auditing Agent Skills via Constraint-Guided Representation Synthesis

Semia is a static auditor for LLM-driven agent skills that uses constraint-guided synthesis to analyze security risks in hybrid code-and-prose configurations. Testing 13,728 real-world skills from public marketplaces, Semia identified critical semantic vulnerabilities in over half and achieved 97.7% recall, significantly outperforming existing security tools.

AINeutralcrypto.news · Apr 116/10

🧠

AI Cybersecurity Race: OpenAI Finalizes Product While Anthropic Runs Project Glasswing to Hunt Critical Vulnerabilities

OpenAI and Anthropic are escalating competition in AI-powered cybersecurity, with OpenAI finalizing a commercial security product for limited partner deployment while Anthropic operates Project Glasswing, a controlled initiative focused on discovering critical software vulnerabilities. This competitive race signals that both AI labs view cybersecurity as a strategically important application area with commercial and defensive value.

🏢 OpenAI🏢 Anthropic

AIBullisharXiv – CS AI · Mar 96/10

🧠

SecureRAG-RTL: A Retrieval-Augmented, Multi-Agent, Zero-Shot LLM-Driven Framework for Hardware Vulnerability Detection

Researchers developed SecureRAG-RTL, a new AI framework that uses Retrieval-Augmented Generation to detect security vulnerabilities in hardware designs. The system improves detection accuracy by 30% on average across different LLM architectures and addresses the challenge of limited hardware security datasets for AI training.

AIBullishOpenAI News · Mar 65/10

🧠

Codex Security: now in research preview

Codex Security, an AI-powered application security agent, has launched in research preview to help developers detect, validate, and patch complex vulnerabilities. The tool analyzes project context to provide more accurate security assessments with reduced false positives.

AIBullisharXiv – CS AI · Mar 37/1010

🧠

Inference-Time Safety For Code LLMs Via Retrieval-Augmented Revision

Researchers developed a new inference-time safety mechanism for code-generating AI models that uses retrieval-augmented generation to identify and fix security vulnerabilities in real-time. The approach leverages Stack Overflow discussions to guide AI code revision without requiring model retraining, improving security while maintaining interpretability.

AI × CryptoBearishCoinTelegraph – AI · Mar 37/107

🤖

OpenZeppelin finds data contamination in OpenAI’s EVMbench

OpenZeppelin discovered significant flaws in OpenAI's EVMbench dataset, including data contamination from training leaks and at least four incorrectly classified high-severity vulnerabilities. This finding raises concerns about the reliability of AI tools used for blockchain security auditing.

AIBullisharXiv – CS AI · Mar 27/1015

🧠

Learning to Generate Secure Code via Token-Level Rewards

Researchers have developed Vul2Safe, a new framework for generating secure code using large language models, which addresses security vulnerabilities through self-reflection and token-level reinforcement learning. The approach introduces the PrimeVul+ dataset and SRCode training framework to provide more precise optimization of security patterns in code generation.

Page 1 of 2Next →