y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#prompt-injection News & Analysis

76 articles tagged with #prompt-injection. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

76 articles
AIBearisharXiv – CS AI · 15h ago7/10
🧠

Investigating Detection and Obfuscation of Prompt Injection Attacks Against Software Reverse Engineering AI Agents

Researchers have demonstrated that agentic AI systems used for software reverse engineering are vulnerable to prompt injection attacks embedded in executable binaries, and have developed both offensive obfuscation techniques and defensive detection methods. This research highlights critical security gaps in AI-powered code analysis tools that organizations are beginning to deploy in production environments.

AIBearisharXiv – CS AI · 15h ago7/10
🧠

The Surface You Test Is Not the Surface That Breaks

Researchers demonstrate that LLM agent vulnerabilities to prompt injection attacks vary dramatically depending on the injection surface used, with the same attack payload showing 96% success on one model via tool outputs but only 4% via tool descriptions. The study reveals that vulnerability is determined by model-surface interaction rather than the injection channel alone, exposing critical blindspots in current AI security evaluation methodology.

🧠 GPT-4
AIBearisharXiv – CS AI · 15h ago7/10
🧠

Depth-Dependent Indirect Prompt Injection in Tool-Calling ReAct Agents: Injection Depth, Payload Framing, and Turn-Budget Sensitivity

Researchers identified that indirect prompt injection attacks against ReAct AI agents succeed at dramatically different rates depending on where malicious payloads appear in tool sequences, with success rates dropping from 60% at the first tool observation to 0% at deeper positions. The study reveals that payload framing and conversation turn limits have minimal impact on attack success, making injection depth the critical vulnerability factor for AI agent systems handling real-world tasks.

🧠 GPT-4🧠 Claude
AIBearisharXiv – CS AI · 15h ago7/10
🧠

From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors

Researchers reveal a critical vulnerability in LLM agents operating in local workspaces, where attackers can plant hidden prompt injections across multiple steps to gain persistent control. The new ClawTrojan benchmark demonstrates 95.5% attack success rates against GPT-5.4, while a proposed defense mechanism called DASGuard offers runtime protection by tracing and sanitizing potentially malicious control text in sensitive files.

🧠 GPT-5
AIBearisharXiv – CS AI · 15h ago7/10
🧠

Automatically Attacking Software Reverse Engineering AI Agents

Researchers demonstrate a novel adversarial attack using genetic algorithm-based prompt injection that can deceive LLM-powered reverse engineering tools like GhidraMCP into misinterpreting binary executables. This vulnerability exploits how large language models process decompiled code through surreptitious string variable assignments, potentially allowing malware to bypass automated detection systems that rely on AI-driven analysis.

AIBearishDecrypt – AI · 2d ago7/10
🧠

What Is an AI Prompt Injection Attack? The Hidden Threat Hijacking Your Chatbots

Prompt injection attacks allow hackers to manipulate AI chatbots like ChatGPT, Claude, and Gemini through adversarial text inputs, potentially hijacking their behavior and outputs. OpenAI has indicated this vulnerability may be inherent to large language models and difficult to fully eliminate, raising significant security concerns for enterprises and individual users relying on these systems.

What Is an AI Prompt Injection Attack? The Hidden Threat Hijacking Your Chatbots
🏢 OpenAI🧠 ChatGPT🧠 Claude
AIBearisharXiv – CS AI · 3d ago7/10
🧠

SafeSearch: Automated Red-Teaming of LLM-Based Search Agents

Researchers introduce SafeSearch, an automated red-teaming framework that identifies critical vulnerabilities in LLM-based search agents by testing them against 300 adversarial cases spanning misinformation, prompt injection, and other risks. The study reveals that current search agents achieve attack success rates up to 90.5%, with common defenses like reminder prompting providing minimal protection.

🧠 GPT-4
AIBullisharXiv – CS AI · 3d ago7/10
🧠

Controlling the Risk of Corrupted Contexts for Language Models via Early-Exiting

Researchers propose a novel technique using early-exit mechanisms and distribution-free risk control to prevent large language models from degrading performance when exposed to harmful or irrelevant context. The approach maintains a baseline performance level (zero-shot) while selectively leveraging helpful inputs for efficiency gains, demonstrating effectiveness across multiple language tasks.

AIBearisharXiv – CS AI · 3d ago7/10
🧠

Jailbreaking and Mitigation of Vulnerabilities in Large Language Models

A comprehensive arXiv research review examines vulnerabilities in Large Language Models, particularly prompt injection and jailbreaking attacks, while analyzing existing defense mechanisms. The study identifies critical security gaps and proposes future research directions for safer LLM deployment across applications.

AIBearisharXiv – CS AI · 3d ago7/10
🧠

Measuring Real-World Prompt Injection Attacks in LLM-based Resume Screening

Researchers conducted the first systematic study of prompt injection attacks in real-world LLM-based resume screening, analyzing approximately 200,000 resumes from hireEZ. They found that ~1% of resumes contain hidden prompt injections, with prevalence increasing significantly over the past 1-2 years, and discovered that over 90% of injected prompts use subtle methods rather than explicit instructions.

AIBearishArs Technica – AI · 3d ago7/10
🧠

Fed up with vibe coders, dev sneaks data-nuking prompt injection into their code

A developer embedded a prompt injection attack into the jqwik library that instructed AI coding agents to delete application output, highlighting vulnerabilities in AI-assisted development tools. The incident reveals how malicious actors can compromise open-source projects to target AI systems, creating risks for developers relying on autonomous coding agents.

Fed up with vibe coders, dev sneaks data-nuking prompt injection into their code
AIBearisharXiv – CS AI · 4d ago7/10
🧠

Refusal Before Decoding: Detecting and Exploiting Refusal Signals in Intermediate LLM Activations

Researchers demonstrate that large language model refusal behavior can be detected and exploited through intermediate layer activations before final output generation. A new attack method called Mechanistic AutoDAN leverages this discovery to achieve competitive jailbreak success rates while reducing computational time by up to 72%, raising concerns about LLM safety mechanisms.

AIBullisharXiv – CS AI · 4d ago7/10
🧠

Disentangling Adversarial Prompts: A Semantic-Graph Defense for Robust LLM Security

Researchers propose the Adversarial Prompt Disentanglement (APD) framework, a defense mechanism that identifies and neutralizes malicious components in LLM inputs before processing. The system combines semantic decomposition, graph-based intent classification, and transformer-based detection to reduce harmful outputs by over 85% while maintaining model performance.

AIBearisharXiv – CS AI · 4d ago7/10
🧠

MIRAGE: Context-Aware Prompt Injection against Mobile GUI Agents via User-Generated Content

Researchers demonstrate MIRAGE, a technique that exploits vision-language model vulnerabilities in mobile GUI agents by injecting adversarial text into user-generated content regions. The attack achieves 23-30% success rates across five VLM agents without modifying apps or operating systems, revealing a critical security gap in AI-powered mobile automation that existing visual-quality defenses cannot reliably prevent.

AIBearisharXiv – CS AI · 5d ago7/10
🧠

Erased but Exploitable: Black-box Embedding-Aware Prompting Against Unlearned Text-to-Image Diffusion Models

Researchers have developed BEAP, a black-box adversarial attack that bypasses machine unlearning safeguards in text-to-image diffusion models by generating natural-language prompts that evade detection filters. The attack achieves 60% higher success rates than previous methods while remaining undetectable to safety systems, raising critical questions about the robustness of AI model safety mechanisms.

AIBearisharXiv – CS AI · May 127/10
🧠

WebTrap: Stealthy Mid-Task Hijacking of Browser Agents During Navigation

Researchers have discovered WebTrap, a sophisticated prompt injection attack that can stealthily hijack browser-based AI agents during extended tasks by seamlessly blending malicious instructions with legitimate user goals. The attack maintains system usability while achieving high success rates, exposing critical vulnerabilities in autonomous agent systems that current defense mechanisms cannot adequately address.

AIBearisharXiv – CS AI · May 127/10
🧠

When Child Inherits: Modeling and Exploiting Subagent Spawn in Multi-Agent Networks

Researchers have identified critical security vulnerabilities in multi-agent AI networks where compromised parent agents can propagate malicious instructions to spawned subagents through inherited memory. The study demonstrates how current LLM frameworks violate trust boundaries via insecure memory inheritance and weak resource controls, turning localized agent compromises into systemic network risks.

🧠 ChatGPT
AINeutralarXiv – CS AI · May 127/10
🧠

The Trap of Trajectory: Towards Understanding and Mitigating Spurious Correlations in Agentic Memory

Researchers identify a critical vulnerability in agentic memory systems where Large Language Models retrieve and amplify spurious correlations from stored information, leading to erroneous reasoning in downstream decisions. The study benchmarks this risk and proposes CAMEL, a lightweight calibration method that mitigates spurious pattern reliance while maintaining performance on clean data.

AIBearisharXiv – CS AI · May 127/10
🧠

Political Plasticity: An Analysis of Ideological Adaptability in Large Language Models

Researchers developed a testing framework to study "political plasticity"—how Large Language Models adapt their ideological responses based on user context. The study found that newer, larger LLMs reliably shift responses along economic and personal freedom axes when prompted with few-shot examples, while older models show limited adaptability, raising concerns about potential data leakage and model reliability.

AIBearisharXiv – CS AI · May 97/10
🧠

LoopTrap: Termination Poisoning Attacks on LLM Agents

Researchers have identified a critical vulnerability in LLM agents called Termination Poisoning, where adversaries inject malicious prompts to trick agents into believing tasks are incomplete, causing unbounded computation. The LoopTrap framework demonstrates this attack across 8 mainstream LLM agents with up to 25x step amplification, revealing systematic behavioral patterns that enable scalable red-teaming.

AIBullisharXiv – CS AI · May 47/10
🧠

Sentra-Guard: A Real-Time Multilingual Defense Against Adversarial LLM Prompts

Researchers introduce Sentra-Guard, a real-time defense system that detects and mitigates jailbreak and prompt injection attacks on large language models with 99.96% accuracy. The multilingual framework combines FAISS-indexed semantic embeddings with fine-tuned transformers and human-in-the-loop feedback, significantly outperforming existing defenses like LlamaGuard-2 and OpenAI Moderation.

🏢 OpenAI
AINeutralarXiv – CS AI · May 17/10
🧠

Latent Adversarial Detection: Adaptive Probing of LLM Activations for Multi-Turn Attack Detection

Researchers demonstrate that multi-turn prompt injection attacks leave detectable signatures in language model activation patterns, achieving 93.8% detection accuracy through analysis of residual stream trajectories. The approach reveals that adversarial attack sequences exhibit distinctive 'restlessness' patterns across model architectures, though detection effectiveness varies significantly when deployed on real-world data.

AIBearisharXiv – CS AI · Apr 207/10
🧠

HarmfulSkillBench: How Do Harmful Skills Weaponize Your Agents?

Researchers have identified that 4.93% of skills in major LLM agent ecosystems are harmful and can be weaponized for cyberattacks, fraud, and privacy violations. The study reveals that presenting harmful tasks through pre-installed skills dramatically reduces AI model refusal rates, with harm scores increasing from 0.27 to 0.76 when intent is implicit rather than explicit.

AIBearisharXiv – CS AI · Apr 157/10
🧠

TEMPLATEFUZZ: Fine-Grained Chat Template Fuzzing for Jailbreaking and Red Teaming LLMs

Researchers introduce TEMPLATEFUZZ, a fuzzing framework that systematically exploits vulnerabilities in LLM chat templates—a previously overlooked attack surface. The method achieves 98.2% jailbreak success rates on open-source models and 90% on commercial LLMs, significantly outperforming existing prompt injection techniques while revealing critical security gaps in production AI systems.

AIBearisharXiv – CS AI · Apr 147/10
🧠

The Salami Slicing Threat: Exploiting Cumulative Risks in LLM Systems

Researchers have identified a novel jailbreaking vulnerability in LLMs called 'Salami Slicing Risk,' where attackers chain multiple low-risk inputs that individually bypass safety measures but cumulatively trigger harmful outputs. The Salami Attack framework demonstrates over 90% success rates against GPT-4o and Gemini, highlighting a critical gap in current multi-turn defense mechanisms that assume individual requests are adequately monitored.

🧠 GPT-4🧠 Gemini
Page 1 of 4Next →