#defense-mechanisms News & Analysis

7 articles tagged with #defense-mechanisms. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

7 articles

AIBearisharXiv – CS AI · Apr 137/10

🧠

XFED: Non-Collusive Model Poisoning Attack Against Byzantine-Robust Federated Classifiers

Researchers have developed XFED, a novel model poisoning attack that compromises federated learning systems without requiring attackers to communicate or coordinate with each other. The attack successfully bypasses eight state-of-the-art defenses, revealing fundamental security vulnerabilities in FL deployments that were previously underestimated.

AINeutralarXiv – CS AI · Apr 107/10

🧠

The Defense Trilemma: Why Prompt Injection Defense Wrappers Fail?

Researchers prove mathematically that no continuous input-preprocessing defense can simultaneously maintain utility, preserve model functionality, and guarantee safety against prompt injection attacks in language models with connected prompt spaces. The findings establish a fundamental trilemma showing that defenses must inevitably fail at some threshold inputs, with results verified in Lean 4 and validated empirically across three LLMs.

AIBearisharXiv – CS AI · Apr 77/10

🧠

Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw

Researchers conducted the first real-world safety evaluation of OpenClaw, a widely deployed AI agent with extensive system access, revealing significant security vulnerabilities. The study found that poisoning any single dimension of the agent's state increases attack success rates from 24.6% to 64-74%, with even the strongest defenses still vulnerable to 63.8% of attacks.

🧠 GPT-5🧠 Claude🧠 Sonnet

AIBullisharXiv – CS AI · Mar 177/10

🧠

Directional Embedding Smoothing for Robust Vision Language Models

Researchers have extended the RESTA defense mechanism to vision-language models (VLMs) to protect against jailbreaking attacks that can cause AI systems to produce harmful outputs. The study found that directional embedding noise significantly reduces attack success rates across the JailBreakV-28K benchmark, providing a lightweight security layer for AI agent systems.

AIBearisharXiv – CS AI · Mar 37/104

🧠

Stealthy Poisoning Attacks Bypass Defenses in Regression Settings

Researchers have developed new stealthy poisoning attacks that can bypass current defenses in regression models used across industrial and scientific applications. The study introduces BayesClean, a novel defense mechanism that better protects against these sophisticated attacks when poisoning attempts are significant.

AIBearisharXiv – CS AI · Feb 276/107

🧠

Analysis of LLMs Against Prompt Injection and Jailbreak Attacks

Researchers evaluated prompt injection and jailbreak vulnerabilities across multiple open-source LLMs including Phi, Mistral, DeepSeek-R1, Llama 3.2, Qwen, and Gemma. The study found significant behavioral variations across models and that lightweight defense mechanisms can be consistently bypassed by long, reasoning-heavy prompts.

AINeutralOpenAI News · May 34/106

🧠

Transfer of adversarial robustness between perturbation types

The article discusses research on adversarial robustness transfer between different types of perturbations in machine learning models. This research examines how defensive techniques developed for one type of attack may provide protection against other types of adversarial examples.