#ai-security News & Analysis

216 articles tagged with #ai-security. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

216 articles

AIBullisharXiv – CS AI · Mar 37/104

🧠

BinaryShield: Cross-Service Threat Intelligence in LLM Services using Privacy-Preserving Fingerprints

BinaryShield is the first privacy-preserving threat intelligence system that enables secure sharing of attack fingerprints across compliance boundaries for LLM services. The system addresses the critical security gap where organizations cannot share prompt injection attack intelligence between services due to privacy regulations, achieving an F1-score of 0.94 while providing 38x faster similarity search than dense embeddings.

AINeutralarXiv – CS AI · Mar 37/103

🧠

Towards Transferable Defense Against Malicious Image Edits

Researchers propose TDAE, a new defense framework that protects images from malicious AI-powered edits by using imperceptible perturbations and coordinated image-text optimization. The system employs FlatGrad Defense Mechanism for visual protection and Dynamic Prompt Defense for textual enhancement, achieving better cross-model transferability than existing methods.

AIBullisharXiv – CS AI · Mar 37/105

🧠

Self-Destructive Language Model

Researchers introduce SEAM, a novel defense mechanism that makes large language models 'self-destructive' when adversaries attempt harmful fine-tuning attacks. The system allows models to function normally for legitimate tasks but causes catastrophic performance degradation when fine-tuned on harmful data, creating robust protection against malicious modifications.

AIBearisharXiv – CS AI · Mar 37/103

🧠

Multi-PA: A Multi-perspective Benchmark on Privacy Assessment for Large Vision-Language Models

Researchers introduce Multi-PA, a comprehensive benchmark for evaluating privacy risks in Large Vision-Language Models (LVLMs), covering 26 personal privacy categories, 15 trade secrets, and 18 state secrets across 31,962 samples. Testing 21 open-source and 2 closed-source LVLMs revealed significant privacy vulnerabilities, with models generally posing high risks of facilitating privacy breaches across different privacy categories.

AIBearisharXiv – CS AI · Mar 37/104

🧠

Stealthy Poisoning Attacks Bypass Defenses in Regression Settings

Researchers have developed new stealthy poisoning attacks that can bypass current defenses in regression models used across industrial and scientific applications. The study introduces BayesClean, a novel defense mechanism that better protects against these sophisticated attacks when poisoning attempts are significant.

AIBearisharXiv – CS AI · Feb 277/107

🧠

Bob's Confetti: Phonetic Memorization Attacks in Music and Video Generation

Researchers discovered a vulnerability in AI music and video generation systems where phonetic prompts can bypass copyright filters. The 'Adversarial PhoneTic Prompting' attack achieves 91% similarity to copyrighted content by using sound-alike phrases that preserve acoustic patterns while evading text-based detection.

$NEAR$APT

AIBullisharXiv – CS AI · Feb 277/104

🧠

AgentSentry: Mitigating Indirect Prompt Injection in LLM Agents via Temporal Causal Diagnostics and Context Purification

Researchers have developed AgentSentry, a novel defense framework that protects AI agents from indirect prompt injection attacks by detecting and mitigating malicious control attempts in real-time. The system achieved 74.55% utility under attack, significantly outperforming existing defenses by 20-33 percentage points while maintaining benign performance.

AIBullisharXiv – CS AI · Feb 277/105

🧠

Dyslexify: A Mechanistic Defense Against Typographic Attacks in CLIP

Researchers developed Dyslexify, a training-free defense mechanism against typographic attacks on CLIP vision models that inject malicious text into images. The method selectively disables attention heads responsible for text processing, improving robustness by up to 22% while maintaining 99% of standard performance.

AI × CryptoBullisharXiv – CS AI · Feb 277/103

🤖

IMMACULATE: A Practical LLM Auditing Framework via Verifiable Computation

Researchers introduce IMMACULATE, a framework that audits commercial large language model API services to detect fraud like model substitution and token overbilling without requiring access to internal systems. The system uses verifiable computation to audit a small fraction of requests, achieving strong detection guarantees with less than 1% throughput overhead.

AIBearisharXiv – CS AI · Feb 277/103

🧠

DropVLA: An Action-Level Backdoor Attack on Vision--Language--Action Models

Researchers have developed DropVLA, a backdoor attack method that can manipulate Vision-Language-Action AI models to execute unintended robot actions while maintaining normal performance. The attack achieves 98.67%-99.83% success rates with minimal data poisoning and has been validated on real robotic systems.

AIBearisharXiv – CS AI · Feb 277/105

🧠

Poisoned Acoustics

Researchers demonstrate how training-data poisoning attacks can compromise deep neural networks used for acoustic vehicle classification with just 0.5% corrupted data, achieving 95.7% attack success rate while remaining undetectable. The study reveals fundamental vulnerabilities in AI training pipelines and proposes cryptographic defenses using post-quantum digital signatures and blockchain-like verification methods.

AIBullisharXiv – CS AI · Feb 277/106

🧠

TT-SEAL: TTD-Aware Selective Encryption for Adversarially-Robust and Low-Latency Edge AI

Researchers developed TT-SEAL, a selective encryption framework for compressed AI models using Tensor-Train Decomposition that maintains security while encrypting only 4.89-15.92% of parameters. The system achieves the same robustness as full encryption while reducing AES decryption overhead in end-to-end latency from 58% to as low as 2.76%.

AINeutralarXiv – CS AI · Feb 277/106

🧠

RaPA: Enhancing Transferable Targeted Attacks via Random Parameter Pruning

Researchers propose Random Parameter Pruning Attack (RaPA), a new method that improves targeted adversarial attacks by randomly pruning model parameters during optimization. The technique achieves up to 11.7% higher attack success rates when transferring from CNN to Transformer models compared to existing methods.

AIBearishCoinTelegraph – AI · Feb 257/104

🧠

Anthropic says it's been targeted in massive distillation attacks

Anthropic alleges that Chinese AI companies DeepSeek, Moonshot, and MiniMax conducted massive distillation attacks against its Claude AI system, creating 24,000 accounts and making 16 million exchanges to scrape training data. This represents a significant case of AI model theft and highlights growing tensions in the global AI competition.

AIBearishArs Technica – AI · Feb 197/107

🧠

OpenClaw security fears lead Meta, other AI firms to restrict its use

Meta and other major AI companies have restricted the use of OpenClaw, a viral agentic AI tool, due to security concerns. The tool is recognized for its high capabilities but criticized for being wildly unpredictable in its behavior.

AI × CryptoBearishDL News · Feb 197/108

🤖

OpenAI releases crypto security tool as Claude blamed for $2.7m Moonwell bug

OpenAI has released a new crypto security tool following a costly incident where AI-generated code from Claude caused a $2.7 million bug that affected Moonwell users. The timing suggests a response to growing concerns about AI-generated code vulnerabilities in cryptocurrency applications.

AI × CryptoBullishOpenAI News · Feb 187/108

🤖

Introducing EVMbench

OpenAI and Paradigm have launched EVMbench, a new benchmark tool designed to evaluate AI agents' capabilities in detecting, patching, and exploiting high-severity vulnerabilities in smart contracts. This collaboration represents a significant step toward improving smart contract security through AI-powered analysis tools.

AI × CryptoBearishCoinTelegraph – AI · Feb 97/105

🤖

OpenClaw AI hub faces wave of poisoned plugins, SlowMist warns

SlowMist security firm has identified 472 malicious AI skills on the OpenClaw AI hub containing dangerous code. This represents a growing trend of hackers targeting AI plugins and extensions to gain access to cryptocurrency investors' devices.

AINeutralGoogle DeepMind Blog · Dec 117/104

🧠

Deepening our partnership with the UK AI Security Institute

Google DeepMind and the UK AI Security Institute (AISI) are strengthening their collaboration on critical AI safety and security research. This partnership aims to advance research in AI safety measures and security protocols.

AINeutralOpenAI News · Nov 197/106

🧠

GPT-5.1-Codex-Max System Card

OpenAI has released a system card for GPT-5.1-CodexMax detailing comprehensive safety measures including specialized training against harmful tasks and prompt injections. The document outlines both model-level and product-level mitigations such as agent sandboxing and configurable network access controls.

AINeutralOpenAI News · Nov 127/106

🧠

Fighting the New York Times’ invasion of user privacy

OpenAI is resisting the New York Times' request for access to 20 million private ChatGPT conversations, while simultaneously implementing enhanced security and privacy protections for user data. This legal dispute highlights growing tensions over data privacy and corporate access to AI conversation logs.

AINeutralOpenAI News · Nov 77/107

🧠

Understanding prompt injections: a frontier security challenge

Prompt injections represent a significant security vulnerability in AI systems, requiring specialized research and countermeasures. OpenAI is actively developing safeguards and training methods to protect users from these frontier attacks.

AIBullishOpenAI News · Oct 307/106

🧠

Introducing Aardvark: OpenAI’s agentic security researcher

OpenAI has launched Aardvark, an AI-powered autonomous security researcher that can find, validate, and help fix software vulnerabilities at scale. The system is currently in private beta with early testing available through sign-up.

AIBullishOpenAI News · Sep 127/108

🧠

Working with US CAISI and UK AISI to build more secure AI systems

OpenAI has announced progress on its partnership with the US CAISI and UK AISI to enhance AI safety and security systems. The collaboration focuses on strengthening safeguards and security measures for AI development and deployment.

AINeutralHugging Face Blog · May 247/107

🧠

CyberSecEval 2 - A Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models

CyberSecEval 2 is a comprehensive evaluation framework designed to assess cybersecurity risks and capabilities of Large Language Models. The framework aims to provide standardized metrics for evaluating AI model security vulnerabilities and defensive capabilities in cybersecurity contexts.

← PrevPage 5 of 9Next →