#backdoor-attacks News & Analysis

22 articles tagged with #backdoor-attacks. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

22 articles

AIBearisharXiv – CS AI · 16h ago7/10

🧠

From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors

Researchers reveal a critical vulnerability in LLM agents operating in local workspaces, where attackers can plant hidden prompt injections across multiple steps to gain persistent control. The new ClawTrojan benchmark demonstrates 95.5% attack success rates against GPT-5.4, while a proposed defense mechanism called DASGuard offers runtime protection by tracing and sanitizing potentially malicious control text in sensitive files.

🧠 GPT-5

AIBearisharXiv – CS AI · 3d ago7/10

🧠

Hijacking Agent Memory: Stealthy Trojan Attacks Through Conversational Interaction

Researchers present MemPoison, a novel attack that exploits vulnerabilities in large language model agents by injecting malicious information into their long-term memory through dialogue interactions. The attack achieves up to 95% success rates by using semantic bridges, entity masquerading, and embedding optimization to bypass modern selective memory mechanisms, revealing critical security gaps in autonomous AI systems.

AIBearisharXiv – CS AI · 3d ago7/10

🧠

Token-Level Generalization in LoRA Adapter Backdoors: Attack Characterization and Behavioral Detection

Researchers demonstrate that LoRA adapters, widely used for fine-tuning large language models, can be backdoored through training data poisoning while maintaining clean performance. The backdoor generalizes at the token level rather than structural patterns, making it harder for defenders to detect generically. Two complementary detection methods—behavioral probing and weight-level analysis—successfully identify poisoned adapters without false positives.

AIBearisharXiv – CS AI · 4d ago7/10

🧠

Can Quantum Federated Learning Withstand Circuit-Level Backdoors?

Researchers identify critical vulnerabilities in Quantum Federated Learning (QFL) systems through a novel Circuit-Level Backdoor Threat (CULT) model that demonstrates how malicious clients can exploit quantum mechanisms to degrade model accuracy. Existing defense mechanisms fail to fully prevent attacks, with accuracy dropping up to 50% even against popular mitigation strategies like Krum and FLGuardian.

AIBearisharXiv – CS AI · 4d ago7/10

🧠

Backdoor Attacks on Fault Detection and Localization in Cyber-Physical Systems

Researchers have identified critical vulnerabilities in machine learning-based fault detection systems used in cyber-physical infrastructure, demonstrating that backdoor attacks can compromise these safety-critical systems with poisoning rates as low as 10%. This threat directly impacts smart grids, industrial automation, and other essential infrastructure that increasingly rely on AI models for anomaly detection and system recovery.

AIBearisharXiv – CS AI · 5d ago7/10

🧠

Cordyceps: Covert Control Attacks on LLMs via Data Poisoning

Researchers have identified a new data poisoning vulnerability in large language models called 'covert control attacks' that uses semantic associations to hide malicious instructions rather than obvious trigger phrases. This method successfully evades existing backdoor and prompt injection defenses, maintaining up to 98% attack success rates and outperforming traditional poisoning techniques by 40%.

AIBullisharXiv – CS AI · May 97/10

🧠

DeTrigger: A Gradient-Centric Approach to Backdoor Attack Mitigation in Federated Learning

DeTrigger is a new federated learning framework that uses gradient analysis to detect and neutralize backdoor attacks in distributed machine learning systems. The approach achieves 251x faster detection than existing methods while mitigating 98.9% of backdoor attacks with minimal accuracy loss, addressing a critical vulnerability in privacy-preserving collaborative AI training.

AIBearisharXiv – CS AI · May 77/10

🧠

Undetectable Backdoors in Model Parameters: Hiding Sparse Secrets in High Dimensions

Researchers present Sparse Backdoor, a supply-chain attack that embeds undetectable backdoors into pre-trained image classifiers by injecting sparse perturbations masked with Gaussian noise. The attack is proven computationally infeasible to distinguish from original models under standard hardness assumptions, raising critical security concerns for AI model deployment and verification.

AIBearisharXiv – CS AI · Apr 147/10

🧠

Backdoors in RLVR: Jailbreak Backdoors in LLMs From Verifiable Reward

Researchers have discovered a critical vulnerability in Reinforcement Learning with Verifiable Rewards (RLVR), an emerging training paradigm that enhances LLM reasoning abilities. By injecting less than 2% poisoned data into training sets, attackers can implant backdoors that degrade safety performance by 73% when triggered, without modifying the reward verifier itself.

AIBearisharXiv – CS AI · Apr 137/10

🧠

BadSkill: Backdoor Attacks on Agent Skills via Model-in-Skill Poisoning

Researchers demonstrate BadSkill, a backdoor attack that exploits AI agent ecosystems by embedding malicious logic in seemingly benign third-party skills. The attack achieves up to 99.5% success rate by poisoning bundled model artifacts to activate hidden payloads when specific trigger conditions are met, revealing a critical supply-chain vulnerability in extensible AI systems.

AIBearisharXiv – CS AI · Apr 107/10

🧠

SkillTrojan: Backdoor Attacks on Skill-Based Agent Systems

Researchers have identified SkillTrojan, a novel backdoor attack targeting skill-based agent systems by embedding malicious logic within reusable skills rather than model parameters. The attack leverages skill composition to execute attacker-defined payloads with up to 97.2% success rates while maintaining clean task performance, revealing critical security gaps in AI agent architectures.

🧠 GPT-5

AIBullisharXiv – CS AI · Mar 177/10

🧠

Purifying Generative LLMs from Backdoors without Prior Knowledge or Clean Reference

Researchers developed a new framework to remove backdoors from large language models without prior knowledge of triggers or clean reference models. The method uses an immunization-inspired approach that creates synthetic backdoored variants to identify and neutralize malicious components while preserving the model's generative capabilities.

AIBearisharXiv – CS AI · Mar 57/10

🧠

Sleeper Cell: Injecting Latent Malice Temporal Backdoors into Tool-Using LLMs

Researchers demonstrate a novel backdoor attack method called 'SFT-then-GRPO' that can inject hidden malicious behavior into AI agents while maintaining their performance on standard benchmarks. The attack creates 'sleeper agents' that appear benign but can execute harmful actions under specific trigger conditions, highlighting critical security vulnerabilities in the adoption of third-party AI models.

AIBearisharXiv – CS AI · Mar 56/10

🧠

Structure-Aware Distributed Backdoor Attacks in Federated Learning

Researchers have discovered that model architecture significantly affects the success of backdoor attacks in federated learning systems. The study introduces new metrics to measure model vulnerability and develops a framework showing that certain network structures can amplify malicious perturbations even with minimal poisoning.

AIBearisharXiv – CS AI · Mar 47/103

🧠

Semantic-level Backdoor Attack against Text-to-Image Diffusion Models

Researchers have developed SemBD, a new semantic-level backdoor attack against text-to-image diffusion models that achieves 100% success rate while evading current defenses. The attack uses continuous semantic regions as triggers rather than fixed textual patterns, making it significantly harder to detect and defend against.

AIBearisharXiv – CS AI · Feb 277/105

🧠

Poisoned Acoustics

Researchers demonstrate how training-data poisoning attacks can compromise deep neural networks used for acoustic vehicle classification with just 0.5% corrupted data, achieving 95.7% attack success rate while remaining undetectable. The study reveals fundamental vulnerabilities in AI training pipelines and proposes cryptographic defenses using post-quantum digital signatures and blockchain-like verification methods.

AIBearisharXiv – CS AI · Feb 277/103

🧠

DropVLA: An Action-Level Backdoor Attack on Vision--Language--Action Models

Researchers have developed DropVLA, a backdoor attack method that can manipulate Vision-Language-Action AI models to execute unintended robot actions while maintaining normal performance. The attack achieves 98.67%-99.83% success rates with minimal data poisoning and has been validated on real robotic systems.

AINeutralarXiv – CS AI · May 126/10

🧠

Beyond the False Trade-off: Adaptive EWC for Stealthy and Generalizable T2I Backdoors

Researchers propose Cosine-Aware Adaptive Elastic Weight Consolidation (EWC) to improve text-to-image model backdoor attacks while maintaining model fidelity and generalization. The method addresses a fundamental trade-off between attack success and output quality by dynamically adjusting regularization weights based on semantic utility, achieving stronger performance on both in-domain and out-of-domain datasets compared to existing approaches.

AIBearisharXiv – CS AI · May 46/10

🧠

BadSNN: Backdoor Attacks on Spiking Neural Networks via Adversarial Spiking Neuron

Researchers have developed BadSNN, a novel backdoor attack method targeting Spiking Neural Networks by exploiting hyperparameter variations in spiking neurons. The attack demonstrates superior performance compared to existing backdoor methods and shows resistance to current mitigation techniques, raising security concerns for SNNs used in edge computing and neuromorphic applications.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Critical-CoT: A Robust Defense Framework against Reasoning-Level Backdoor Attacks in Large Language Models

Researchers introduce Critical-CoT, a defense framework that protects large language models against reasoning-level backdoor attacks by fine-tuning models to develop critical thinking behaviors. Unlike token-level backdoors, these attacks inject malicious reasoning steps into chain-of-thought processes, making them harder to detect; the proposed defense demonstrates strong robustness across multiple LLMs and datasets.

AINeutralarXiv – CS AI · Mar 126/10

🧠

TASER: Task-Aware Spectral Energy Refine for Backdoor Suppression in UAV Swarms Decentralized Federated Learning

Researchers propose TASER, a new defense framework against backdoor attacks in UAV-based decentralized federated learning systems. The system uses spectral energy analysis rather than traditional outlier detection, achieving below 20% attack success rates while maintaining accuracy within 5% loss.

AIBullisharXiv – CS AI · Mar 37/108

🧠

DualSentinel: A Lightweight Framework for Detecting Targeted Attacks in Black-box LLM via Dual Entropy Lull Pattern

Researchers introduce DualSentinel, a lightweight framework for detecting targeted attacks on Large Language Models by identifying 'Entropy Lull' patterns - periods of abnormally low token probability entropy that indicate when LLMs are being coercively controlled. The system uses dual-check verification to accurately detect backdoor and prompt injection attacks with near-zero false positives while maintaining minimal computational overhead.

$NEAR