y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#ai-security News & Analysis

186 articles tagged with #ai-security. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

186 articles
AIBearisharXiv – CS AI · Mar 47/104
🧠

Zero-Permission Manipulation: Can We Trust Large Multimodal Model Powered GUI Agents?

Researchers discovered a critical security vulnerability in AI-powered GUI agents on Android, where malicious apps can hijack agent actions without requiring dangerous permissions. The 'Action Rebinding' attack exploits timing gaps between AI observation and action, achieving 100% success rates in tests across six popular Android GUI agents.

AIBearisharXiv – CS AI · Mar 47/103
🧠

Semantic-level Backdoor Attack against Text-to-Image Diffusion Models

Researchers have developed SemBD, a new semantic-level backdoor attack against text-to-image diffusion models that achieves 100% success rate while evading current defenses. The attack uses continuous semantic regions as triggers rather than fixed textual patterns, making it significantly harder to detect and defend against.

AIBearishFortune Crypto · Mar 37/103
🧠

Boards aren’t ready for the AI age: What happens when your CEO gets deepfaked?

Deepfake attacks targeting CEO likenesses have escalated from cybersecurity concerns to immediate boardroom threats, yet most companies lack preparedness plans. This represents a significant vulnerability as AI-generated impersonations become more sophisticated and accessible to malicious actors.

Boards aren’t ready for the AI age: What happens when your CEO gets deepfaked?
AIBearisharXiv – CS AI · Mar 37/104
🧠

VPI-Bench: Visual Prompt Injection Attacks for Computer-Use Agents

Researchers have identified critical security vulnerabilities in Computer-Use Agents (CUAs) through Visual Prompt Injection attacks, where malicious instructions are embedded in user interfaces. Their VPI-Bench study shows CUAs can be deceived at rates up to 51% and Browser-Use Agents up to 100% on certain platforms, with current defenses proving inadequate.

AIBullisharXiv – CS AI · Mar 37/105
🧠

Self-Destructive Language Model

Researchers introduce SEAM, a novel defense mechanism that makes large language models 'self-destructive' when adversaries attempt harmful fine-tuning attacks. The system allows models to function normally for legitimate tasks but causes catastrophic performance degradation when fine-tuned on harmful data, creating robust protection against malicious modifications.

AIBearisharXiv – CS AI · Mar 37/103
🧠

Multi-PA: A Multi-perspective Benchmark on Privacy Assessment for Large Vision-Language Models

Researchers introduce Multi-PA, a comprehensive benchmark for evaluating privacy risks in Large Vision-Language Models (LVLMs), covering 26 personal privacy categories, 15 trade secrets, and 18 state secrets across 31,962 samples. Testing 21 open-source and 2 closed-source LVLMs revealed significant privacy vulnerabilities, with models generally posing high risks of facilitating privacy breaches across different privacy categories.

AINeutralarXiv – CS AI · Mar 37/104
🧠

Trojans in Artificial Intelligence (TrojAI) Final Report

IARPA's TrojAI program investigated AI Trojans - malicious backdoors hidden in AI models that can cause system failures or allow unauthorized control. The multi-year initiative developed detection methods through weight analysis and trigger inversion, while identifying ongoing challenges in AI security that require continued research.

AINeutralarXiv – CS AI · Mar 37/103
🧠

Towards Transferable Defense Against Malicious Image Edits

Researchers propose TDAE, a new defense framework that protects images from malicious AI-powered edits by using imperceptible perturbations and coordinated image-text optimization. The system employs FlatGrad Defense Mechanism for visual protection and Dynamic Prompt Defense for textual enhancement, achieving better cross-model transferability than existing methods.

AIBullisharXiv – CS AI · Mar 37/104
🧠

BinaryShield: Cross-Service Threat Intelligence in LLM Services using Privacy-Preserving Fingerprints

BinaryShield is the first privacy-preserving threat intelligence system that enables secure sharing of attack fingerprints across compliance boundaries for LLM services. The system addresses the critical security gap where organizations cannot share prompt injection attack intelligence between services due to privacy regulations, achieving an F1-score of 0.94 while providing 38x faster similarity search than dense embeddings.

AIBearisharXiv – CS AI · Mar 37/103
🧠

ERIS: Evolutionary Real-world Interference Scheme for Jailbreaking Audio Large Models

Researchers developed ERIS, a new framework that uses genetic algorithms to exploit Audio Large Models (ALMs) by disguising malicious instructions as natural speech with background noise. The system can bypass safety filters by embedding harmful content in real-world audio interference that appears harmless to humans and security systems.

AIBearisharXiv – CS AI · Mar 37/104
🧠

Stealthy Poisoning Attacks Bypass Defenses in Regression Settings

Researchers have developed new stealthy poisoning attacks that can bypass current defenses in regression models used across industrial and scientific applications. The study introduces BayesClean, a novel defense mechanism that better protects against these sophisticated attacks when poisoning attempts are significant.

AIBullisharXiv – CS AI · Feb 277/106
🧠

TT-SEAL: TTD-Aware Selective Encryption for Adversarially-Robust and Low-Latency Edge AI

Researchers developed TT-SEAL, a selective encryption framework for compressed AI models using Tensor-Train Decomposition that maintains security while encrypting only 4.89-15.92% of parameters. The system achieves the same robustness as full encryption while reducing AES decryption overhead in end-to-end latency from 58% to as low as 2.76%.

AI × CryptoBullisharXiv – CS AI · Feb 277/103
🤖

IMMACULATE: A Practical LLM Auditing Framework via Verifiable Computation

Researchers introduce IMMACULATE, a framework that audits commercial large language model API services to detect fraud like model substitution and token overbilling without requiring access to internal systems. The system uses verifiable computation to audit a small fraction of requests, achieving strong detection guarantees with less than 1% throughput overhead.

AIBullisharXiv – CS AI · Feb 277/105
🧠

Dyslexify: A Mechanistic Defense Against Typographic Attacks in CLIP

Researchers developed Dyslexify, a training-free defense mechanism against typographic attacks on CLIP vision models that inject malicious text into images. The method selectively disables attention heads responsible for text processing, improving robustness by up to 22% while maintaining 99% of standard performance.

AIBearisharXiv – CS AI · Feb 277/105
🧠

Poisoned Acoustics

Researchers demonstrate how training-data poisoning attacks can compromise deep neural networks used for acoustic vehicle classification with just 0.5% corrupted data, achieving 95.7% attack success rate while remaining undetectable. The study reveals fundamental vulnerabilities in AI training pipelines and proposes cryptographic defenses using post-quantum digital signatures and blockchain-like verification methods.

AIBullisharXiv – CS AI · Feb 277/104
🧠

AgentSentry: Mitigating Indirect Prompt Injection in LLM Agents via Temporal Causal Diagnostics and Context Purification

Researchers have developed AgentSentry, a novel defense framework that protects AI agents from indirect prompt injection attacks by detecting and mitigating malicious control attempts in real-time. The system achieved 74.55% utility under attack, significantly outperforming existing defenses by 20-33 percentage points while maintaining benign performance.

AIBearisharXiv – CS AI · Feb 277/103
🧠

DropVLA: An Action-Level Backdoor Attack on Vision--Language--Action Models

Researchers have developed DropVLA, a backdoor attack method that can manipulate Vision-Language-Action AI models to execute unintended robot actions while maintaining normal performance. The attack achieves 98.67%-99.83% success rates with minimal data poisoning and has been validated on real robotic systems.

AINeutralarXiv – CS AI · Feb 277/106
🧠

RaPA: Enhancing Transferable Targeted Attacks via Random Parameter Pruning

Researchers propose Random Parameter Pruning Attack (RaPA), a new method that improves targeted adversarial attacks by randomly pruning model parameters during optimization. The technique achieves up to 11.7% higher attack success rates when transferring from CNN to Transformer models compared to existing methods.

AIBearisharXiv – CS AI · Feb 277/107
🧠

Bob's Confetti: Phonetic Memorization Attacks in Music and Video Generation

Researchers discovered a vulnerability in AI music and video generation systems where phonetic prompts can bypass copyright filters. The 'Adversarial PhoneTic Prompting' attack achieves 91% similarity to copyrighted content by using sound-alike phrases that preserve acoustic patterns while evading text-based detection.

$NEAR$APT
AIBearishCoinTelegraph – AI · Feb 257/104
🧠

Anthropic says it's been targeted in massive distillation attacks

Anthropic alleges that Chinese AI companies DeepSeek, Moonshot, and MiniMax conducted massive distillation attacks against its Claude AI system, creating 24,000 accounts and making 16 million exchanges to scrape training data. This represents a significant case of AI model theft and highlights growing tensions in the global AI competition.

Anthropic says it's been targeted in massive distillation attacks
AIBearishArs Technica – AI · Feb 197/107
🧠

OpenClaw security fears lead Meta, other AI firms to restrict its use

Meta and other major AI companies have restricted the use of OpenClaw, a viral agentic AI tool, due to security concerns. The tool is recognized for its high capabilities but criticized for being wildly unpredictable in its behavior.

AI × CryptoBearishDL News · Feb 197/108
🤖

OpenAI releases crypto security tool as Claude blamed for $2.7m Moonwell bug

OpenAI has released a new crypto security tool following a costly incident where AI-generated code from Claude caused a $2.7 million bug that affected Moonwell users. The timing suggests a response to growing concerns about AI-generated code vulnerabilities in cryptocurrency applications.

AI × CryptoBullishOpenAI News · Feb 187/108
🤖

Introducing EVMbench

OpenAI and Paradigm have launched EVMbench, a new benchmark tool designed to evaluate AI agents' capabilities in detecting, patching, and exploiting high-severity vulnerabilities in smart contracts. This collaboration represents a significant step toward improving smart contract security through AI-powered analysis tools.

AI × CryptoBearishCoinTelegraph – AI · Feb 97/105
🤖

OpenClaw AI hub faces wave of poisoned plugins, SlowMist warns

SlowMist security firm has identified 472 malicious AI skills on the OpenClaw AI hub containing dangerous code. This represents a growing trend of hackers targeting AI plugins and extensions to gain access to cryptocurrency investors' devices.

OpenClaw AI hub faces wave of poisoned plugins, SlowMist warns
AINeutralGoogle DeepMind Blog · Dec 117/104
🧠

Deepening our partnership with the UK AI Security Institute

Google DeepMind and the UK AI Security Institute (AISI) are strengthening their collaboration on critical AI safety and security research. This partnership aims to advance research in AI safety measures and security protocols.

← PrevPage 4 of 8Next →