12 articles tagged with #backdoor-attacks. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBearisharXiv โ CS AI ยท 3d ago7/10
๐ง Researchers have discovered a critical vulnerability in Reinforcement Learning with Verifiable Rewards (RLVR), an emerging training paradigm that enhances LLM reasoning abilities. By injecting less than 2% poisoned data into training sets, attackers can implant backdoors that degrade safety performance by 73% when triggered, without modifying the reward verifier itself.
AIBearisharXiv โ CS AI ยท 4d ago7/10
๐ง Researchers demonstrate BadSkill, a backdoor attack that exploits AI agent ecosystems by embedding malicious logic in seemingly benign third-party skills. The attack achieves up to 99.5% success rate by poisoning bundled model artifacts to activate hidden payloads when specific trigger conditions are met, revealing a critical supply-chain vulnerability in extensible AI systems.
AIBearisharXiv โ CS AI ยท Apr 107/10
๐ง Researchers have identified SkillTrojan, a novel backdoor attack targeting skill-based agent systems by embedding malicious logic within reusable skills rather than model parameters. The attack leverages skill composition to execute attacker-defined payloads with up to 97.2% success rates while maintaining clean task performance, revealing critical security gaps in AI agent architectures.
๐ง GPT-5
AIBullisharXiv โ CS AI ยท Mar 177/10
๐ง Researchers developed a new framework to remove backdoors from large language models without prior knowledge of triggers or clean reference models. The method uses an immunization-inspired approach that creates synthetic backdoored variants to identify and neutralize malicious components while preserving the model's generative capabilities.
AIBearisharXiv โ CS AI ยท Mar 57/10
๐ง Researchers demonstrate a novel backdoor attack method called 'SFT-then-GRPO' that can inject hidden malicious behavior into AI agents while maintaining their performance on standard benchmarks. The attack creates 'sleeper agents' that appear benign but can execute harmful actions under specific trigger conditions, highlighting critical security vulnerabilities in the adoption of third-party AI models.
AIBearisharXiv โ CS AI ยท Mar 56/10
๐ง Researchers have discovered that model architecture significantly affects the success of backdoor attacks in federated learning systems. The study introduces new metrics to measure model vulnerability and develops a framework showing that certain network structures can amplify malicious perturbations even with minimal poisoning.
AIBearisharXiv โ CS AI ยท Mar 47/103
๐ง Researchers have developed SemBD, a new semantic-level backdoor attack against text-to-image diffusion models that achieves 100% success rate while evading current defenses. The attack uses continuous semantic regions as triggers rather than fixed textual patterns, making it significantly harder to detect and defend against.
AIBearisharXiv โ CS AI ยท Feb 277/105
๐ง Researchers demonstrate how training-data poisoning attacks can compromise deep neural networks used for acoustic vehicle classification with just 0.5% corrupted data, achieving 95.7% attack success rate while remaining undetectable. The study reveals fundamental vulnerabilities in AI training pipelines and proposes cryptographic defenses using post-quantum digital signatures and blockchain-like verification methods.
AIBearisharXiv โ CS AI ยท Feb 277/103
๐ง Researchers have developed DropVLA, a backdoor attack method that can manipulate Vision-Language-Action AI models to execute unintended robot actions while maintaining normal performance. The attack achieves 98.67%-99.83% success rates with minimal data poisoning and has been validated on real robotic systems.
AINeutralarXiv โ CS AI ยท 3d ago6/10
๐ง Researchers introduce Critical-CoT, a defense framework that protects large language models against reasoning-level backdoor attacks by fine-tuning models to develop critical thinking behaviors. Unlike token-level backdoors, these attacks inject malicious reasoning steps into chain-of-thought processes, making them harder to detect; the proposed defense demonstrates strong robustness across multiple LLMs and datasets.
AINeutralarXiv โ CS AI ยท Mar 126/10
๐ง Researchers propose TASER, a new defense framework against backdoor attacks in UAV-based decentralized federated learning systems. The system uses spectral energy analysis rather than traditional outlier detection, achieving below 20% attack success rates while maintaining accuracy within 5% loss.
AIBullisharXiv โ CS AI ยท Mar 37/108
๐ง Researchers introduce DualSentinel, a lightweight framework for detecting targeted attacks on Large Language Models by identifying 'Entropy Lull' patterns - periods of abnormally low token probability entropy that indicate when LLMs are being coercively controlled. The system uses dual-check verification to accurately detect backdoor and prompt injection attacks with near-zero false positives while maintaining minimal computational overhead.
$NEAR