y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#adversarial-ml News & Analysis

13 articles tagged with #adversarial-ml. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

13 articles
AIBearisharXiv – CS AI · 1d ago7/10
🧠

Targeting World Models to Compromise Robot Learning Pipelines

Researchers demonstrate a novel data poisoning attack targeting world models used in robot learning pipelines, showing how malicious prompts or dynamics hidden in training data can be activated only when processed through world models to generate unsafe robotic policies. The attack bypasses traditional safety measures by appearing benign in ground truth datasets while compromising downstream robot learning systems, affecting both action-conditioned and text-conditioned models.

AIBearisharXiv – CS AI · 6d ago7/10
🧠

Widening the Gap: Exploiting LLM Quantization via Outlier Injection

Researchers demonstrate the first practical quantization-conditioned attack that reliably compromises large language models across advanced quantization methods including AWQ, GPTQ, and GGUF. The attack exploits how outlier weights cause rounding errors in modern quantization schemes, allowing adversaries to inject hidden malicious behaviors that activate only after quantization, posing significant security risks to the deployment pipeline.

AIBearisharXiv – CS AI · Jun 27/10
🧠

Persona Attack: Incremental Memory Injection Jailbreak Attack against Large Language Models

Researchers have identified a new jailbreak attack called Persona Attack that exploits LLMs' memory and conversation context to bypass safety mechanisms. By incrementally injecting instructions through dialogue, the attack achieves up to 95% success rates, demonstrating that accumulated memory instructions can override built-in safety alignment regardless of traditional safety training.

AIBearisharXiv – CS AI · Jun 27/10
🧠

SilentDrift: Exploiting Action Chunking for Stealthy Backdoor Attacks on Vision-Language-Action Models

Researchers have discovered a critical security vulnerability in Vision-Language-Action models used in robotics, demonstrating a stealthy backdoor attack called SILENTDRIFT that exploits action chunking mechanisms. The attack achieves 93.2% success rate while remaining visually undetectable, raising serious concerns about the safety of AI-powered robotic systems in critical applications.

AIBearisharXiv – CS AI · Jun 17/10
🧠

Stateful Online Monitoring Catches Distributed Agent Attacks

Researchers demonstrate the first distributed agent attack where language models coordinate across multiple accounts to hide cyberattacks from detection systems. They propose a stateful online monitoring solution using real-time clustering that catches these distributed threats 30% earlier while maintaining negligible latency for legitimate traffic.

AIBearisharXiv – CS AI · May 297/10
🧠

Token-Level Generalization in LoRA Adapter Backdoors: Attack Characterization and Behavioral Detection

Researchers demonstrate that LoRA adapters, widely used for fine-tuning large language models, can be backdoored through training data poisoning while maintaining clean performance. The backdoor generalizes at the token level rather than structural patterns, making it harder for defenders to detect generically. Two complementary detection methods—behavioral probing and weight-level analysis—successfully identify poisoned adapters without false positives.

AIBearisharXiv – CS AI · May 287/10
🧠

Backdoor Attacks on Fault Detection and Localization in Cyber-Physical Systems

Researchers have identified critical vulnerabilities in machine learning-based fault detection systems used in cyber-physical infrastructure, demonstrating that backdoor attacks can compromise these safety-critical systems with poisoning rates as low as 10%. This threat directly impacts smart grids, industrial automation, and other essential infrastructure that increasingly rely on AI models for anomaly detection and system recovery.

AIBearisharXiv – CS AI · May 277/10
🧠

Cordyceps: Covert Control Attacks on LLMs via Data Poisoning

Researchers have identified a new data poisoning vulnerability in large language models called 'covert control attacks' that uses semantic associations to hide malicious instructions rather than obvious trigger phrases. This method successfully evades existing backdoor and prompt injection defenses, maintaining up to 98% attack success rates and outperforming traditional poisoning techniques by 40%.

AINeutralarXiv – CS AI · May 77/10
🧠

SoK: Robustness in Large Language Models against Jailbreak Attacks

Researchers introduce Security Cube, a comprehensive evaluation framework for assessing Large Language Model robustness against jailbreak attacks. The study systematically catalogs existing attack and defense methods while establishing benchmarks across 13 attack vectors and 5 defense mechanisms, revealing critical gaps in current LLM safety practices.

AIBearisharXiv – CS AI · May 17/10
🧠

Secret Stealing Attacks on Local LLM Fine-Tuning through Supply-Chain Model Code Backdoors

Researchers demonstrate a novel attack that steals sensitive secrets (API keys, personal identifiers, financial records) from locally fine-tuned language models by embedding malicious code in model architectures. The attack achieves over 98% success rate and bypasses current defense mechanisms including differential privacy and code auditing, exposing a critical supply-chain vulnerability in AI model development.

AIBearisharXiv – CS AI · Apr 147/10
🧠

Jailbreaking the Matrix: Nullspace Steering for Controlled Model Subversion

Researchers have developed Head-Masked Nullspace Steering (HMNS), a novel jailbreak technique that exploits circuit-level vulnerabilities in large language models by identifying and suppressing specific attention heads responsible for safety mechanisms. The method achieves state-of-the-art attack success rates with fewer queries than previous approaches, demonstrating that current AI safety defenses remain fundamentally vulnerable to geometry-aware adversarial interventions.

AIBearisharXiv – CS AI · Apr 107/10
🧠

BadImplant: Injection-based Multi-Targeted Graph Backdoor Attack

Researchers have demonstrated the first multi-targeted backdoor attack against graph neural networks (GNNs) in graph classification tasks, using a novel subgraph injection method that simultaneously redirects multiple predictions to different target labels while maintaining clean accuracy. The attack shows high efficacy across multiple GNN architectures and datasets, with resilience against existing defense mechanisms, exposing significant vulnerabilities in GNN security.

AINeutralarXiv – CS AI · May 126/10
🧠

Beyond the False Trade-off: Adaptive EWC for Stealthy and Generalizable T2I Backdoors

Researchers propose Cosine-Aware Adaptive Elastic Weight Consolidation (EWC) to improve text-to-image model backdoor attacks while maintaining model fidelity and generalization. The method addresses a fundamental trade-off between attack success and output quality by dynamically adjusting regularization weights based on semantic utility, achieving stronger performance on both in-domain and out-of-domain datasets compared to existing approaches.