18 articles tagged with #ai-defense. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv – CS AI · Apr 107/10
🧠Researchers propose HyPE and HyPS, a two-part defense framework using hyperbolic geometry to detect and neutralize harmful prompts in Vision-Language Models. The approach offers a lightweight, interpretable alternative to blacklist filters and classifier-based systems that are vulnerable to adversarial attacks.
AIBullisharXiv – CS AI · Apr 77/10
🧠Researchers have developed CoopGuard, a new defense framework that uses cooperative AI agents to protect Large Language Models from sophisticated multi-round adversarial attacks. The system employs three specialized agents coordinated by a central system that maintains defense state across interactions, achieving a 78.9% reduction in attack success rates compared to existing defenses.
AINeutralarXiv – CS AI · Mar 277/10
🧠Researchers identified critical security vulnerabilities in Diffusion Large Language Models (dLLMs) that differ from traditional autoregressive LLMs, stemming from their iterative generation process. They developed DiffuGuard, a training-free defense framework that reduces jailbreak attack success rates from 47.9% to 14.7% while maintaining model performance.
AIBearisharXiv – CS AI · Mar 277/10
🧠Researchers have identified a new attack vector called Epistemic Bias Injection (EBI) that manipulates AI language models by injecting factually correct but biased content into retrieval-augmented generation databases. The attack steers model outputs toward specific viewpoints while evading traditional detection methods, though a new defense mechanism called BiasDef shows promise in mitigating these threats.
AIBearisharXiv – CS AI · Mar 267/10
🧠Researchers developed a genetic algorithm-based method using persona prompts to exploit large language models, reducing refusal rates by 50-70% across multiple LLMs. The study reveals significant vulnerabilities in AI safety mechanisms and demonstrates how these attacks can be enhanced when combined with existing methods.
AINeutralarXiv – CS AI · Mar 177/10
🧠Researchers introduce GroupGuard, a defense framework to combat coordinated attacks by multiple AI agents in collaborative systems. The study shows group collusive attacks increase success rates by up to 15% compared to individual attacks, while GroupGuard achieves 88% detection accuracy in identifying and isolating malicious agents.
AINeutralarXiv – CS AI · Mar 177/10
🧠Researchers introduced VideoSafetyEval, a benchmark revealing that video-based large language models have 34.2% worse safety performance than image-based models. They developed VideoSafety-R1, a dual-stage framework that achieves 71.1% improvement in safety through alarm token-guided fine-tuning and safety-guided reinforcement learning.
AIBullisharXiv – CS AI · Mar 97/10
🧠Researchers developed Sysformer, a novel approach to safeguard large language models by adapting system prompts rather than fine-tuning model parameters. The method achieved up to 80% improvement in refusing harmful prompts while maintaining 90% compliance with safe prompts across 5 different LLMs.
AINeutralarXiv – CS AI · Mar 47/104
🧠Researchers propose a game-theoretic framework using Stackelberg equilibrium and Rapidly exploring Random Trees to model interactions between attackers trying to jailbreak LLMs and defensive AI systems. The framework provides a mathematical foundation for understanding and improving AI safety guardrails against prompt-based attacks.
AINeutralarXiv – CS AI · Mar 47/102
🧠Researchers introduce WARP, a new defense mechanism for machine unlearning protocols that protects against privacy attacks where adversaries can exploit differences between pre- and post-unlearning AI models. The technique reduces attack success rates by up to 92% while maintaining model accuracy on retained data.
AIBullishFortune Crypto · 2d ago6/10
🧠Artemis has secured $70 million in funding to develop AI-powered defense systems against increasingly sophisticated AI-driven cyberattacks. The funding reflects growing market demand for advanced security solutions as AI-enabled threats become faster and more cost-effective to deploy.
AINeutralarXiv – CS AI · 3d ago6/10
🧠Researchers introduce Critical-CoT, a defense framework that protects large language models against reasoning-level backdoor attacks by fine-tuning models to develop critical thinking behaviors. Unlike token-level backdoors, these attacks inject malicious reasoning steps into chain-of-thought processes, making them harder to detect; the proposed defense demonstrates strong robustness across multiple LLMs and datasets.
AINeutralarXiv – CS AI · 3d ago6/10
🧠Researchers propose CanaryRAG, a runtime defense mechanism that protects Retrieval-Augmented Generation systems from adversarial attacks that extract proprietary data from knowledge bases. The solution uses embedded canary tokens to detect leakage in real-time while maintaining normal system performance, offering a practical safeguard for organizations deploying RAG-based AI systems.
AINeutralarXiv – CS AI · 3d ago6/10
🧠Researchers introduce ToM-SB, a novel challenge where AI defenders must use theory-of-mind reasoning to deceive attackers trying to extract sensitive information. Through reinforcement learning, trained models outperform frontier LLMs like GPT-4 and Gemini-Pro, revealing an emergent bidirectional relationship between belief modeling and deception capabilities.
🧠 GPT-5
AIBullisharXiv – CS AI · Mar 37/108
🧠Researchers introduce DualSentinel, a lightweight framework for detecting targeted attacks on Large Language Models by identifying 'Entropy Lull' patterns - periods of abnormally low token probability entropy that indicate when LLMs are being coercively controlled. The system uses dual-check verification to accurately detect backdoor and prompt injection attacks with near-zero false positives while maintaining minimal computational overhead.
$NEAR
AIBearishOpenAI News · Feb 256/106
🧠A new threat report analyzes how malicious actors are combining AI models with websites and social platforms to carry out attacks. The report examines the implications of these AI-powered threats for detection and defense systems.
AIBullishOpenAI News · Oct 286/104
🧠Doppel has developed an AI defense system using OpenAI's GPT-5 and reinforcement fine-tuning to prevent deepfake and impersonation attacks before they spread. The system reduces analyst workloads by 80% and cuts threat response times from hours to minutes.
AINeutralOpenAI News · Aug 226/106
🧠Researchers have developed a new method to evaluate neural network classifiers' ability to defend against previously unseen adversarial attacks. The approach introduces the UAR (Unforeseen Attack Robustness) metric to assess model performance against unanticipated threats and emphasizes testing across diverse attack scenarios.