#code-security News & Analysis

8 articles tagged with #code-security. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

8 articles

AINeutralarXiv – CS AI · Jun 117/10

🧠

MPC-Patch-Bench: Security-Aware LLM Code Patch for Multi-Party Computation

Researchers introduce MPC-Patch-Bench, the first repository-level benchmark for evaluating LLM code repair in Secure Multi-Party Computation systems. The benchmark reveals that current LLMs achieve only 22.9% functional resolution on MPC tasks, dropping to 17.1% when security and numerical-fidelity constraints are applied, highlighting significant gaps in AI's ability to handle cryptographically-sensitive code.

AIBullisharXiv – CS AI · Jun 17/10

🧠

CVE-Factory: Scaling Expert-Level Agentic Tasks for Code Security Vulnerability

CVE-Factory is an automated multi-agent framework that transforms vulnerability metadata into executable security tasks with expert-level quality, achieving 95% correctness and enabling the creation of LiveCVEBench—a continuously updated benchmark of 190 security tasks across 14 programming languages that advances AI code security evaluation.

🧠 Claude

AINeutralarXiv – CS AI · May 97/10

🧠

Bridging Generation and Training: A Systematic Review of Quality Issues in LLMs for Code

A systematic review of 114 studies reveals that code quality defects in large language models stem primarily from training data imperfections rather than model limitations alone. The research establishes a taxonomy linking 18 propagation mechanisms between data quality issues and generated code failures, while advocating for proactive data governance over reactive post-generation filtering.

AIBearisharXiv – CS AI · May 77/10

🧠

Syntax- and Compilation-Preserving Evasion of LLM Vulnerability Detectors

Researchers demonstrate that LLM-based vulnerability detectors, increasingly used in software security pipelines, can be evaded through syntax-preserving code transformations. The study reveals that models with 70%+ accuracy on clean code can fail to detect 87%+ of vulnerabilities when subjected to minor edits, with adversarial attacks achieving up to 92.5% evasion rates—raising serious questions about the reliability of AI-driven security tools in production environments.

🧠 GPT-4

AIBullishGoogle AI Blog · Mar 177/10

🧠

Our latest investment in open source security for the AI era

Google announces new investments in open source security specifically designed for the AI era. The company is developing new tools and building code security solutions to address emerging security challenges in AI development.

AIBearisharXiv – CS AI · Mar 47/103

🧠

ZeroDayBench: Evaluating LLM Agents on Unseen Zero-Day Vulnerabilities for Cyberdefense

Researchers introduced ZeroDayBench, a new benchmark testing LLM agents' ability to find and patch 22 critical vulnerabilities in open-source code. Testing on frontier models GPT-5.2, Claude Sonnet 4.5, and Grok 4.1 revealed that current LLMs cannot yet autonomously solve cybersecurity tasks, highlighting limitations in AI-powered code security.

AINeutralarXiv – CS AI · Jun 46/10

🧠

Revisiting Vul-RAG: Reproducibility and Replicability of RAG-based Vulnerability Detection with Open-Weight Models

Researchers conducted a reproducibility study of Vul-RAG, a RAG-based framework for detecting software vulnerabilities using LLMs, and found that while results are reproducible with open-weight models, performance plateaus around 0.30 pairwise accuracy regardless of model sophistication. The findings suggest that simply scaling up model capacity does not substantially improve vulnerability detection capabilities.

AIBullishGoogle DeepMind Blog · Oct 235/107

🧠

Introducing CodeMender: an AI agent for code security

CodeMender is a new AI agent designed to automatically identify and fix critical security vulnerabilities in software code. The tool leverages advanced artificial intelligence capabilities to enhance code security and reduce software risks.