AIBearisharXiv – CS AI · 1d ago7/10
🧠Researchers demonstrate a novel data poisoning attack targeting world models used in robot learning pipelines, showing how malicious prompts or dynamics hidden in training data can be activated only when processed through world models to generate unsafe robotic policies. The attack bypasses traditional safety measures by appearing benign in ground truth datasets while compromising downstream robot learning systems, affecting both action-conditioned and text-conditioned models.
AIBearisharXiv – CS AI · 6d ago7/10
🧠Researchers demonstrate the first practical quantization-conditioned attack that reliably compromises large language models across advanced quantization methods including AWQ, GPTQ, and GGUF. The attack exploits how outlier weights cause rounding errors in modern quantization schemes, allowing adversaries to inject hidden malicious behaviors that activate only after quantization, posing significant security risks to the deployment pipeline.
AIBearisharXiv – CS AI · Jun 27/10
🧠Researchers have identified a new jailbreak attack called Persona Attack that exploits LLMs' memory and conversation context to bypass safety mechanisms. By incrementally injecting instructions through dialogue, the attack achieves up to 95% success rates, demonstrating that accumulated memory instructions can override built-in safety alignment regardless of traditional safety training.
AIBearisharXiv – CS AI · Jun 27/10
🧠Researchers have discovered a critical security vulnerability in Vision-Language-Action models used in robotics, demonstrating a stealthy backdoor attack called SILENTDRIFT that exploits action chunking mechanisms. The attack achieves 93.2% success rate while remaining visually undetectable, raising serious concerns about the safety of AI-powered robotic systems in critical applications.
AIBearisharXiv – CS AI · Jun 17/10
🧠Researchers demonstrate the first distributed agent attack where language models coordinate across multiple accounts to hide cyberattacks from detection systems. They propose a stateful online monitoring solution using real-time clustering that catches these distributed threats 30% earlier while maintaining negligible latency for legitimate traffic.
AIBearisharXiv – CS AI · May 297/10
🧠Researchers demonstrate that LoRA adapters, widely used for fine-tuning large language models, can be backdoored through training data poisoning while maintaining clean performance. The backdoor generalizes at the token level rather than structural patterns, making it harder for defenders to detect generically. Two complementary detection methods—behavioral probing and weight-level analysis—successfully identify poisoned adapters without false positives.
AIBearisharXiv – CS AI · May 287/10
🧠Researchers have identified critical vulnerabilities in machine learning-based fault detection systems used in cyber-physical infrastructure, demonstrating that backdoor attacks can compromise these safety-critical systems with poisoning rates as low as 10%. This threat directly impacts smart grids, industrial automation, and other essential infrastructure that increasingly rely on AI models for anomaly detection and system recovery.
AIBearisharXiv – CS AI · May 277/10
🧠Researchers have identified a new data poisoning vulnerability in large language models called 'covert control attacks' that uses semantic associations to hide malicious instructions rather than obvious trigger phrases. This method successfully evades existing backdoor and prompt injection defenses, maintaining up to 98% attack success rates and outperforming traditional poisoning techniques by 40%.
AINeutralarXiv – CS AI · May 77/10
🧠Researchers introduce Security Cube, a comprehensive evaluation framework for assessing Large Language Model robustness against jailbreak attacks. The study systematically catalogs existing attack and defense methods while establishing benchmarks across 13 attack vectors and 5 defense mechanisms, revealing critical gaps in current LLM safety practices.
AIBearisharXiv – CS AI · May 17/10
🧠Researchers demonstrate a novel attack that steals sensitive secrets (API keys, personal identifiers, financial records) from locally fine-tuned language models by embedding malicious code in model architectures. The attack achieves over 98% success rate and bypasses current defense mechanisms including differential privacy and code auditing, exposing a critical supply-chain vulnerability in AI model development.
AIBearisharXiv – CS AI · Apr 147/10
🧠Researchers have developed Head-Masked Nullspace Steering (HMNS), a novel jailbreak technique that exploits circuit-level vulnerabilities in large language models by identifying and suppressing specific attention heads responsible for safety mechanisms. The method achieves state-of-the-art attack success rates with fewer queries than previous approaches, demonstrating that current AI safety defenses remain fundamentally vulnerable to geometry-aware adversarial interventions.
AIBearisharXiv – CS AI · Apr 107/10
🧠Researchers have demonstrated the first multi-targeted backdoor attack against graph neural networks (GNNs) in graph classification tasks, using a novel subgraph injection method that simultaneously redirects multiple predictions to different target labels while maintaining clean accuracy. The attack shows high efficacy across multiple GNN architectures and datasets, with resilience against existing defense mechanisms, exposing significant vulnerabilities in GNN security.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers propose Cosine-Aware Adaptive Elastic Weight Consolidation (EWC) to improve text-to-image model backdoor attacks while maintaining model fidelity and generalization. The method addresses a fundamental trade-off between attack success and output quality by dynamically adjusting regularization weights based on semantic utility, achieving stronger performance on both in-domain and out-of-domain datasets compared to existing approaches.