#llm-vulnerabilities News & Analysis

12 articles tagged with #llm-vulnerabilities. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

12 articles

AIBearisharXiv – CS AI · 4d ago7/10

🧠

MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks

Researchers present MM-PoisonRAG, a framework demonstrating critical vulnerabilities in multimodal RAG systems where adversaries can inject poisoned content into knowledge bases to manipulate AI outputs. Two attack strategies—localized poisoning targeting specific queries and globalized poisoning affecting all queries—achieve high success rates and bypass existing defenses, exposing fundamental security gaps in RAG-augmented language models.

AIBearisharXiv – CS AI · May 127/10

🧠

When Child Inherits: Modeling and Exploiting Subagent Spawn in Multi-Agent Networks

Researchers have identified critical security vulnerabilities in multi-agent AI networks where compromised parent agents can propagate malicious instructions to spawned subagents through inherited memory. The study demonstrates how current LLM frameworks violate trust boundaries via insecure memory inheritance and weak resource controls, turning localized agent compromises into systemic network risks.

🧠 ChatGPT

AIBearisharXiv – CS AI · May 97/10

🧠

LeakDojo: Decoding the Leakage Threats of RAG Systems

LeakDojo is a new research framework that systematically evaluates security vulnerabilities in Retrieval-Augmented Generation (RAG) systems, revealing that stronger LLM instruction-following capabilities correlate with higher data leakage risks. The study benchmarks six attack methods across multiple LLMs and datasets, providing critical insights into how RAG databases can be exploited and suggesting that improvements in RAG faithfulness may paradoxically increase security vulnerabilities.

AIBearisharXiv – CS AI · Apr 147/10

🧠

Jailbreaking the Matrix: Nullspace Steering for Controlled Model Subversion

Researchers have developed Head-Masked Nullspace Steering (HMNS), a novel jailbreak technique that exploits circuit-level vulnerabilities in large language models by identifying and suppressing specific attention heads responsible for safety mechanisms. The method achieves state-of-the-art attack success rates with fewer queries than previous approaches, demonstrating that current AI safety defenses remain fundamentally vulnerable to geometry-aware adversarial interventions.

AIBearisharXiv – CS AI · Mar 277/10

🧠

PIDP-Attack: Combining Prompt Injection with Database Poisoning Attacks on Retrieval-Augmented Generation Systems

Researchers have developed PIDP-Attack, a new cybersecurity threat that combines prompt injection with database poisoning to manipulate AI responses in Retrieval-Augmented Generation (RAG) systems. The attack method demonstrated 4-16% higher success rates than existing techniques across multiple benchmark datasets and eight different large language models.

AIBearisharXiv – CS AI · Mar 267/10

🧠

Invisible Threats from Model Context Protocol: Generating Stealthy Injection Payload via Tree-based Adaptive Search

Researchers have discovered a new black-box attack method called Tree structured Injection for Payloads (TIP) that can compromise AI agents using Model Context Protocol with over 95% success rate. The attack exploits vulnerabilities in how large language models interact with external tools, bypassing existing defenses and requiring significantly fewer queries than previous methods.

AIBearisharXiv – CS AI · Mar 57/10

🧠

Sleeper Cell: Injecting Latent Malice Temporal Backdoors into Tool-Using LLMs

Researchers demonstrate a novel backdoor attack method called 'SFT-then-GRPO' that can inject hidden malicious behavior into AI agents while maintaining their performance on standard benchmarks. The attack creates 'sleeper agents' that appear benign but can execute harmful actions under specific trigger conditions, highlighting critical security vulnerabilities in the adoption of third-party AI models.

AINeutralarXiv – CS AI · Feb 277/103

🧠

Manifold of Failure: Behavioral Attraction Basins in Language Models

Researchers developed a new framework called MAP-Elites to systematically map vulnerability regions in Large Language Models, revealing distinct safety landscape patterns across different models. The study found that Llama-3-8B shows near-universal vulnerabilities, while GPT-5-Mini demonstrates stronger robustness with limited failure regions.

$NEAR

AINeutralarXiv – CS AI · 4d ago6/10

🧠

Securing Retrieval-Augmented Generation: A Taxonomy of Attacks, Defenses, and Future Directions

Researchers present SLOT, a comprehensive taxonomy for understanding security vulnerabilities in retrieval-augmented generation (RAG) systems that extend LLMs with external knowledge. The framework categorizes attacks and defenses across four dimensions—attack surface, defense layer, security objective, and target scope—while identifying structural gaps in current evaluation methods and proposing future research directions for securing RAG pipelines.

AINeutralarXiv – CS AI · May 126/10

🧠

The First Drop of Ink: Nonlinear Impact of Misleading Information in Long-Context Reasoning

Researchers reveal that large language models suffer from a nonlinear performance degradation when exposed to misleading information in long-context scenarios, with the majority of decline occurring when hard distractors comprise just a small fraction of the total context. This finding, termed 'The First Drop of Ink' effect, demonstrates that attention mechanisms disproportionately focus on misleading content, suggesting that upstream retrieval quality is more critical than previously understood for RAG and agentic systems.

AIBearisharXiv – CS AI · Apr 66/10

🧠

LogicPoison: Logical Attacks on Graph Retrieval-Augmented Generation

Researchers have discovered LogicPoison, a new attack method that exploits vulnerabilities in Graph-based Retrieval-Augmented Generation (GraphRAG) systems by corrupting logical connections in knowledge graphs without altering text semantics. The attack successfully bypasses GraphRAG's existing defenses by targeting the topological integrity of underlying graphs, significantly degrading AI system performance.

AIBearisharXiv – CS AI · Mar 37/107

🧠

Reverse CAPTCHA: Evaluating LLM Susceptibility to Invisible Unicode Instruction Injection

Researchers developed 'Reverse CAPTCHA,' a framework that tests how large language models respond to invisible Unicode-encoded instructions embedded in normal text. The study found that AI models can follow hidden instructions that humans cannot see, with tool use dramatically increasing compliance rates and different AI providers showing distinct preferences for encoding schemes.