#ai-security News & Analysis

216 articles tagged with #ai-security. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

216 articles

AIBearishDecrypt · Mar 177/10

🧠

What If Elon Musk's Grok Leaks Classified Info? Elizabeth Warren Is Worried—The Pentagon Isn't

Senator Elizabeth Warren is demanding answers after the Pentagon granted Elon Musk's xAI classified network access despite NSA warnings. The controversy centers on concerns that Musk's Grok AI could potentially leak sensitive classified information.

🏢 xAI🧠 Grok

AIBullisharXiv – CS AI · Mar 177/10

🧠

$p^2$RAG: Privacy-Preserving RAG Service Supporting Arbitrary Top-$k$ Retrieval

Researchers propose p²RAG, a new privacy-preserving Retrieval-Augmented Generation system that supports arbitrary top-k retrieval while being 3-300x faster than existing solutions. The system uses an interactive bisection method instead of sorting and employs secret sharing across two servers to protect user prompts and database content.

$RAG

AIBullisharXiv – CS AI · Mar 177/10

🧠

RESQ: A Unified Framework for REliability- and Security Enhancement of Quantized Deep Neural Networks

Researchers propose RESQ, a three-stage framework that enhances both security and reliability of quantized deep neural networks through specialized fine-tuning techniques. The framework demonstrates up to 10.35% improvement in attack resilience and 12.47% in fault resilience while maintaining competitive accuracy across multiple neural network architectures.

AIBearisharXiv – CS AI · Mar 177/10

🧠

AI Evasion and Impersonation Attacks on Facial Re-Identification with Activation Map Explanations

Researchers developed a novel framework for generating adversarial patches that can fool facial recognition systems through both evasion and impersonation attacks. The method reduces facial recognition accuracy from 90% to 0.4% in white-box settings and demonstrates strong cross-model generalization, highlighting critical vulnerabilities in surveillance systems.

AIBullisharXiv – CS AI · Mar 177/10

🧠

Architecture-Agnostic Feature Synergy for Universal Defense Against Heterogeneous Generative Threats

Researchers propose ATFS, a new framework that provides universal defense against multiple generative AI architectures simultaneously, overcoming limitations of current defense mechanisms that only work against specific AI models. The system achieves over 90% protection effectiveness within 40 iterations and works across different generative models including Diffusion Models, GANs, and VQ-VAE.

AINeutralarXiv – CS AI · Mar 177/10

🧠

GroupGuard: A Framework for Modeling and Defending Collusive Attacks in Multi-Agent Systems

Researchers introduce GroupGuard, a defense framework to combat coordinated attacks by multiple AI agents in collaborative systems. The study shows group collusive attacks increase success rates by up to 15% compared to individual attacks, while GroupGuard achieves 88% detection accuracy in identifying and isolating malicious agents.

AIBullisharXiv – CS AI · Mar 177/10

🧠

Agent Privilege Separation in OpenClaw: A Structural Defense Against Prompt Injection

Researchers developed a two-agent defense system called OpenClaw that achieved 0% attack success rate against prompt injection attacks on LLM applications. The system uses agent isolation and JSON formatting to structurally prevent malicious prompts from reaching action-taking agents.

AIBearisharXiv – CS AI · Mar 177/10

🧠

Sirens' Whisper: Inaudible Near-Ultrasonic Jailbreaks of Speech-Driven LLMs

Researchers developed SWhisper, a framework that uses near-ultrasonic audio to deliver covert jailbreak attacks against speech-driven AI systems. The technique is inaudible to humans but can successfully bypass AI safety measures with up to 94% effectiveness on commercial models.

AINeutralarXiv – CS AI · Mar 177/10

🧠

What Counts as Real? Speech Restoration and Voice Quality Conversion Pose New Challenges to Deepfake Detection

Researchers demonstrate that current audio deepfake detection systems incorrectly classify legitimate speech processing technologies like voice conversion and restoration as fake audio. A new multi-class detection approach shows improved accuracy by distinguishing between authentic speech, benign modifications, and actual spoofing attempts.

AIBullisharXiv – CS AI · Mar 177/10

🧠

Directional Embedding Smoothing for Robust Vision Language Models

Researchers have extended the RESTA defense mechanism to vision-language models (VLMs) to protect against jailbreaking attacks that can cause AI systems to produce harmful outputs. The study found that directional embedding noise significantly reduces attack success rates across the JailBreakV-28K benchmark, providing a lightweight security layer for AI agent systems.

AIBearishMIT Technology Review · Mar 167/10

🧠

Where OpenAI’s technology could show up in Iran

OpenAI recently reached a controversial agreement allowing the Pentagon to use its AI technology in classified environments. The article explores concerns about potential proliferation of OpenAI's technology to Iran and other restricted regions.

🏢 OpenAI

AIBearisharXiv – CS AI · Mar 167/10

🧠

Altered Thoughts, Altered Actions: Probing Chain-of-Thought Vulnerabilities in VLA Robotic Manipulation

Research reveals critical vulnerabilities in Vision-Language-Action robotic models that use chain-of-thought reasoning, where corrupting object names in internal reasoning traces can reduce task success rates by up to 45%. The study shows these AI systems are vulnerable to attacks on their internal reasoning processes, even when primary inputs remain untouched.

AINeutralarXiv – CS AI · Mar 167/10

🧠

On Deepfake Voice Detection -- It's All in the Presentation

Researchers have identified why current deepfake voice detection systems fail in real-world applications, finding that existing datasets don't account for how audio changes when transmitted through communication channels. A new framework improved detection accuracy by 39-57% and emphasizes that better datasets matter more than larger AI models for effective deepfake detection.

AIBearisharXiv – CS AI · Mar 167/10

🧠

Purify Once, Edit Freely: Breaking Image Protections under Model Mismatch

Researchers have identified a critical vulnerability in image protection systems that use adversarial perturbations to prevent unauthorized AI editing. Two new purification methods can effectively remove these protections, creating a 'purify-once, edit-freely' attack where images become vulnerable to unlimited manipulation.

AIBullisharXiv – CS AI · Mar 127/10

🧠

Repurposing Backdoors for Good: Ephemeral Intrinsic Proofs for Verifiable Aggregation in Cross-silo Federated Learning

Researchers propose a novel lightweight architecture for verifiable aggregation in federated learning that uses backdoor injection as intrinsic proofs instead of expensive cryptographic methods. The approach achieves over 1000x speedup compared to traditional cryptographic baselines while maintaining high detection rates against malicious servers.

AIBearisharXiv – CS AI · Mar 127/10

🧠

Risk-Adjusted Harm Scoring for Automated Red Teaming for LLMs in Financial Services

Researchers developed a new framework for evaluating AI security risks specifically in banking and financial services, introducing the Risk-Adjusted Harm Score (RAHS) to measure severity of AI model failures. The study found that AI models become more vulnerable to security exploits during extended interactions, exposing critical weaknesses in current AI safety assessments for financial institutions.

AIBullisharXiv – CS AI · Mar 127/10

🧠

Detecting and Eliminating Neural Network Backdoors Through Active Paths with Application to Intrusion Detection

Researchers have developed a new method to detect and eliminate backdoor triggers in neural networks using active path analysis. The approach shows promising results in experiments with machine learning models used for intrusion detection, addressing a critical cybersecurity vulnerability.

AIBearisharXiv – CS AI · Mar 127/10

🧠

Na\"ive Exposure of Generative AI Capabilities Undermines Deepfake Detection

Researchers demonstrate that commercial AI chatbot interfaces inadvertently expose capabilities that allow adversaries to bypass deepfake detection systems using only policy-compliant prompts. The study reveals that current deepfake detectors fail against semantic-preserving image refinement techniques enabled by widely accessible AI systems.

AIBearisharXiv – CS AI · Mar 127/10

🧠

Compatibility at a Cost: Systematic Discovery and Exploitation of MCP Clause-Compliance Vulnerabilities

Researchers have identified critical security vulnerabilities in the Model Context Protocol (MCP), a new standard for AI agent interoperability. The study reveals that MCP's flexible compatibility features create attack surfaces that enable silent prompt injection, denial-of-service attacks, and other exploits across multi-language SDK implementations.

AIBearisharXiv – CS AI · Mar 127/10

🧠

Multi-Stream Perturbation Attack: Breaking Safety Alignment of Thinking LLMs Through Concurrent Task Interference

Researchers have discovered a new 'multi-stream perturbation attack' that can break safety mechanisms in thinking-mode large language models by overwhelming them with multiple interleaved tasks. The attack achieves high success rates across major LLMs including Qwen3, DeepSeek, and Gemini 2.5 Flash, causing both safety bypass and system collapse.

🧠 Gemini

AIBearisharXiv – CS AI · Mar 117/10

🧠

When Robots Obey the Patch: Universal Transferable Patch Attacks on Vision-Language-Action Models

Researchers have developed UPA-RFAS, a new adversarial attack framework that can successfully fool Vision-Language-Action (VLA) models used in robotics with universal physical patches that transfer across different models and real-world scenarios. The attack exploits vulnerabilities in AI-powered robots by using patches that can hijack attention mechanisms and cause semantic misalignment between visual and text inputs.

AIBearisharXiv – CS AI · Mar 117/10

🧠

Security Considerations for Multi-agent Systems

A comprehensive study reveals that multi-agent AI systems (MAS) face distinct security vulnerabilities that existing frameworks inadequately address. The research evaluated 16 AI security frameworks against 193 identified threats across 9 categories, finding that no framework achieves majority coverage in any single category, with non-determinism and data leakage being the most under-addressed areas.

AIBullishOpenAI News · Mar 107/10

🧠

Improving instruction hierarchy in frontier LLMs

A new training method called IH-Challenge has been developed to improve instruction hierarchy in frontier large language models. The approach helps models better prioritize trusted instructions, enhancing safety controls and reducing vulnerability to prompt injection attacks.

AIBullishOpenAI News · Mar 97/10

🧠

OpenAI to acquire Promptfoo

OpenAI is acquiring Promptfoo, an AI security platform that specializes in helping enterprises identify and fix vulnerabilities in AI systems during the development process. This acquisition strengthens OpenAI's security capabilities and enterprise offerings.

🏢 OpenAI

AIBullisharXiv – CS AI · Mar 97/10

🧠

Sysformer: Safeguarding Frozen Large Language Models with Adaptive System Prompts

Researchers developed Sysformer, a novel approach to safeguard large language models by adapting system prompts rather than fine-tuning model parameters. The method achieved up to 80% improvement in refusing harmful prompts while maintaining 90% compliance with safe prompts across 5 different LLMs.

← PrevPage 3 of 9Next →