#code-analysis News & Analysis

21 articles tagged with #code-analysis. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

21 articles

AIBullisharXiv – CS AI · May 287/10

🧠

VULPO: Context-Aware Vulnerability Detection via On-Policy LLM Optimization

Researchers introduce VULPO, an on-policy LLM optimization framework for vulnerability detection that achieves 203% improvement over baseline models by incorporating context-aware reasoning and multidimensional reward signals. The approach combines a new ContextVul dataset with specialized fine-tuning to create more effective security analysis tools that reason through complex code interactions.

AIBearisharXiv – CS AI · May 117/10

🧠

Direction for Detection: A Survey of Automated Vulnerability Detection and all of its Pain Points

A comprehensive survey of 87 machine learning vulnerability detection studies reveals that the field has stalled despite a decade of research, trapped in self-reinforcing feedback loops that optimize for narrow, artificial problems. Researchers identify twelve interconnected pain points spanning datasets, formulations, metrics, and evaluation approaches that perpetuate focus on binary C/C++ function-level classification while neglecting vulnerability type prediction, multilingual support, and broader detection granularities.

AIBearisharXiv – CS AI · Apr 107/10

🧠

Beyond Functional Correctness: Design Issues in AI IDE-Generated Large-Scale Projects

Researchers evaluated Cursor, an AI-powered IDE, on its ability to generate large-scale software projects and found it achieves 91% functional correctness but produces significant design issues including code duplication, complexity violations, and framework best-practice breaches that threaten long-term maintainability.

AI × CryptoBearishDL News · Mar 267/10

🤖

Crypto hackers armed with AI stand to make millions of dollars attacking old code

Cybercriminals are leveraging AI language models like ChatGPT and Claude to rapidly scan thousands of lines of code per second, identifying vulnerabilities in legacy systems. This represents a significant escalation in automated hacking capabilities, potentially exposing millions of dollars worth of cryptocurrency assets to sophisticated AI-powered attacks.

🧠 ChatGPT🧠 Claude

AIBearisharXiv – CS AI · Mar 127/10

🧠

MCP-in-SoS: Risk assessment framework for open-source MCP servers

Researchers have developed a risk assessment framework for open-source Model Context Protocol (MCP) servers, revealing significant security vulnerabilities through static code analysis. The study found many MCP servers contain exploitable weaknesses that compromise confidentiality, integrity, and availability, highlighting the need for secure-by-design development as these tools become widely adopted for LLM agents.

AINeutralarXiv – CS AI · Jun 236/10

🧠

Evaluating LLMs for Real-World Web Vulnerability Detection

Researchers benchmarked six large language models on their ability to detect real-world web vulnerabilities in WordPress plugins, finding that while all models can identify security issues, detection rates vary significantly (35-63%) and no model maintains consistent results across repeated tests. The findings reveal both the promise and critical limitations of LLM-based vulnerability detection for security practitioners.

🧠 GPT-5🧠 Claude🧠 Opus

AIBullisharXiv – CS AI · Jun 236/10

🧠

Revelio: Cost-Efficient Agentic Memory Safety Vulnerability Detection For Repository-Scale Codebases

Revelio is a new AI-powered framework that detects memory safety vulnerabilities in large codebases using large language models combined with executable proof-of-concept generation and deterministic sanitizer verification. The system discovered 19 previously unknown vulnerabilities in production projects while maintaining cost-efficiency, addressing the hallucination problem endemic to LLM-based security analysis.

AINeutralarXiv – CS AI · Jun 196/10

🧠

Multi-View Decompilation for LLM-Based Malware Classification

Researchers demonstrate that using multiple decompilers (Ghidra and RetDec) with large language models improves malware classification accuracy compared to single-decompiler approaches. By providing complementary pseudo-C views of the same binary, the multi-view strategy increases recall on malicious samples without requiring additional training, offering a practical enhancement for LLM-based malware triage.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Taming System Complexity: Demystifying Software Engineering Agents in Diagnosing Linux Kernel Faults

Researchers introduce LinuxFLBench, a fault localization benchmark for Linux kernel bugs, and demonstrate that current LLM agents struggle with this complex task, achieving only 41.6% accuracy. They propose LinuxFL+, an enhancement framework that improves accuracy by 7.2-11.2% across all tested agents, addressing a critical gap in software debugging automation.

AIBullisharXiv – CS AI · May 126/10

🧠

VulTriage: Triple-Path Context Augmentation for LLM-Based Vulnerability Detection

Researchers introduce VulTriage, an LLM-based framework that enhances vulnerability detection in source code through triple-path context augmentation combining control flow analysis, vulnerability knowledge retrieval, and semantic summarization. The approach achieves state-of-the-art results on benchmark datasets and demonstrates strong generalization to low-resource scenarios.

AINeutralarXiv – CS AI · Apr 76/10

🧠

Poisoned Identifiers Survive LLM Deobfuscation: A Case Study on Claude Opus 4.6

Research study reveals that when Claude Opus 4.6 deobfuscates JavaScript code, poisoned identifier names from the original string table consistently survive in the reconstructed code, even when the AI demonstrates correct understanding of the code's semantics. Changing the task framing from 'deobfuscate' to 'write fresh implementation' significantly reduced this persistence while maintaining algorithmic accuracy.

🧠 Claude🧠 Haiku🧠 Opus

AIBullisharXiv – CS AI · Mar 276/10

🧠

TRAJEVAL: Decomposing Code Agent Trajectories for Fine-Grained Diagnosis

Researchers introduce TRAJEVAL, a diagnostic framework that breaks down AI code agent performance into three stages (search, read, edit) to identify specific failure points rather than just binary pass/fail outcomes. The framework analyzed 16,758 trajectories and found that real-time feedback based on trajectory signals improved state-of-the-art models by 2.2-4.6 percentage points while reducing costs by 20-31%.

🧠 GPT-5

AIBullishThe Register – AI · Mar 267/10

🧠

AI bug reports went from junk to legit overnight, says Linux kernel czar

Linux kernel czar Linus Torvalds reports that AI-generated bug reports have dramatically improved in quality, transforming from mostly useless submissions to legitimate and valuable contributions overnight. This represents a significant milestone in AI's ability to assist with complex software development and code analysis tasks.

AIBullisharXiv – CS AI · Mar 126/10

🧠

One Model, Many Skills: Parameter-Efficient Fine-Tuning for Multitask Code Analysis

Researchers conducted the first comprehensive evaluation of parameter-efficient fine-tuning (PEFT) for multi-task code analysis, showing that a single PEFT module can match full fine-tuning performance while reducing computational costs by up to 85%. The study found that even 1B-parameter models with multi-task PEFT outperform large general-purpose LLMs like DeepSeek and CodeLlama on code analysis tasks.

AIBullishOpenAI News · Mar 65/10

🧠

Codex Security: now in research preview

Codex Security, an AI-powered application security agent, has launched in research preview to help developers detect, validate, and patch complex vulnerabilities. The tool analyzes project context to provide more accurate security assessments with reduced false positives.

AINeutralarXiv – CS AI · Mar 36/107

🧠

Theory of Code Space: Do Code Agents Understand Software Architecture?

Researchers introduce Theory of Code Space (ToCS), a new benchmark that evaluates AI agents' ability to understand software architecture across multi-file codebases. The study reveals significant performance gaps between frontier LLM agents and rule-based baselines, with F1 scores ranging from 0.129 to 0.646.

AIBullisharXiv – CS AI · Mar 37/108

🧠

FastCode: Fast and Cost-Efficient Code Understanding and Reasoning

Researchers introduce FastCode, a new framework for AI-assisted software engineering that improves code understanding and reasoning efficiency. The system uses structural scouting to navigate codebases without full-text ingestion, significantly reducing computational costs while maintaining accuracy across multiple benchmarks.

AIBullisharXiv – CS AI · Mar 36/105

🧠

Agentic Code Reasoning

Researchers introduce 'semi-formal reasoning' for LLM agents to analyze code semantics without execution, showing significant accuracy improvements across multiple tasks. The methodology achieves 88-93% accuracy on patch verification and 87% on code question answering, potentially enabling practical applications in automated code review and static analysis.

AINeutralarXiv – CS AI · Mar 36/103

🧠

OBsmith: LLM-Powered JavaScript Obfuscator Testing

Researchers introduce OBsmith, an LLM-powered framework that tests JavaScript obfuscators for correctness bugs that can silently alter program functionality. The tool discovered 11 previously unknown bugs that existing JavaScript fuzzers failed to detect, highlighting critical gaps in obfuscation quality assurance.

AINeutralarXiv – CS AI · Apr 205/10

🧠

Analyzing Chain of Thought (CoT) Approaches in Control Flow Code Deobfuscation Tasks

Researchers demonstrate that Chain-of-Thought prompting significantly improves large language models' ability to deobfuscate control flow code, with GPT-5 achieving 16-20% performance gains over zero-shot prompting. The approach offers a potential alternative to expensive manual reverse engineering, though practical deployment remains limited to research benchmarks.

🧠 GPT-5

AINeutralarXiv – CS AI · Apr 75/10

🧠

Measuring LLM Trust Allocation Across Conflicting Software Artifacts

Researchers developed TRACE, a framework to evaluate how LLMs allocate trust between conflicting software artifacts like code, documentation, and tests. The study found that current LLMs are better at identifying natural-language specification issues than detecting subtle code-level problems, with models showing systematic blind spots when implementations drift while documentation remains plausible.