y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#backdoor-attacks News & Analysis

12 articles tagged with #backdoor-attacks. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

12 articles
AIBearisharXiv โ€“ CS AI ยท 3d ago7/10
๐Ÿง 

Backdoors in RLVR: Jailbreak Backdoors in LLMs From Verifiable Reward

Researchers have discovered a critical vulnerability in Reinforcement Learning with Verifiable Rewards (RLVR), an emerging training paradigm that enhances LLM reasoning abilities. By injecting less than 2% poisoned data into training sets, attackers can implant backdoors that degrade safety performance by 73% when triggered, without modifying the reward verifier itself.

AIBearisharXiv โ€“ CS AI ยท 4d ago7/10
๐Ÿง 

BadSkill: Backdoor Attacks on Agent Skills via Model-in-Skill Poisoning

Researchers demonstrate BadSkill, a backdoor attack that exploits AI agent ecosystems by embedding malicious logic in seemingly benign third-party skills. The attack achieves up to 99.5% success rate by poisoning bundled model artifacts to activate hidden payloads when specific trigger conditions are met, revealing a critical supply-chain vulnerability in extensible AI systems.

AIBearisharXiv โ€“ CS AI ยท Apr 107/10
๐Ÿง 

SkillTrojan: Backdoor Attacks on Skill-Based Agent Systems

Researchers have identified SkillTrojan, a novel backdoor attack targeting skill-based agent systems by embedding malicious logic within reusable skills rather than model parameters. The attack leverages skill composition to execute attacker-defined payloads with up to 97.2% success rates while maintaining clean task performance, revealing critical security gaps in AI agent architectures.

๐Ÿง  GPT-5
AIBullisharXiv โ€“ CS AI ยท Mar 177/10
๐Ÿง 

Purifying Generative LLMs from Backdoors without Prior Knowledge or Clean Reference

Researchers developed a new framework to remove backdoors from large language models without prior knowledge of triggers or clean reference models. The method uses an immunization-inspired approach that creates synthetic backdoored variants to identify and neutralize malicious components while preserving the model's generative capabilities.

AIBearisharXiv โ€“ CS AI ยท Mar 57/10
๐Ÿง 

Sleeper Cell: Injecting Latent Malice Temporal Backdoors into Tool-Using LLMs

Researchers demonstrate a novel backdoor attack method called 'SFT-then-GRPO' that can inject hidden malicious behavior into AI agents while maintaining their performance on standard benchmarks. The attack creates 'sleeper agents' that appear benign but can execute harmful actions under specific trigger conditions, highlighting critical security vulnerabilities in the adoption of third-party AI models.

AIBearisharXiv โ€“ CS AI ยท Mar 56/10
๐Ÿง 

Structure-Aware Distributed Backdoor Attacks in Federated Learning

Researchers have discovered that model architecture significantly affects the success of backdoor attacks in federated learning systems. The study introduces new metrics to measure model vulnerability and develops a framework showing that certain network structures can amplify malicious perturbations even with minimal poisoning.

AIBearisharXiv โ€“ CS AI ยท Mar 47/103
๐Ÿง 

Semantic-level Backdoor Attack against Text-to-Image Diffusion Models

Researchers have developed SemBD, a new semantic-level backdoor attack against text-to-image diffusion models that achieves 100% success rate while evading current defenses. The attack uses continuous semantic regions as triggers rather than fixed textual patterns, making it significantly harder to detect and defend against.

AIBearisharXiv โ€“ CS AI ยท Feb 277/105
๐Ÿง 

Poisoned Acoustics

Researchers demonstrate how training-data poisoning attacks can compromise deep neural networks used for acoustic vehicle classification with just 0.5% corrupted data, achieving 95.7% attack success rate while remaining undetectable. The study reveals fundamental vulnerabilities in AI training pipelines and proposes cryptographic defenses using post-quantum digital signatures and blockchain-like verification methods.

AIBearisharXiv โ€“ CS AI ยท Feb 277/103
๐Ÿง 

DropVLA: An Action-Level Backdoor Attack on Vision--Language--Action Models

Researchers have developed DropVLA, a backdoor attack method that can manipulate Vision-Language-Action AI models to execute unintended robot actions while maintaining normal performance. The attack achieves 98.67%-99.83% success rates with minimal data poisoning and has been validated on real robotic systems.

AINeutralarXiv โ€“ CS AI ยท 3d ago6/10
๐Ÿง 

Critical-CoT: A Robust Defense Framework against Reasoning-Level Backdoor Attacks in Large Language Models

Researchers introduce Critical-CoT, a defense framework that protects large language models against reasoning-level backdoor attacks by fine-tuning models to develop critical thinking behaviors. Unlike token-level backdoors, these attacks inject malicious reasoning steps into chain-of-thought processes, making them harder to detect; the proposed defense demonstrates strong robustness across multiple LLMs and datasets.

AIBullisharXiv โ€“ CS AI ยท Mar 37/108
๐Ÿง 

DualSentinel: A Lightweight Framework for Detecting Targeted Attacks in Black-box LLM via Dual Entropy Lull Pattern

Researchers introduce DualSentinel, a lightweight framework for detecting targeted attacks on Large Language Models by identifying 'Entropy Lull' patterns - periods of abnormally low token probability entropy that indicate when LLMs are being coercively controlled. The system uses dual-check verification to accurately detect backdoor and prompt injection attacks with near-zero false positives while maintaining minimal computational overhead.

$NEAR