#ai-vulnerabilities News & Analysis

24 articles tagged with #ai-vulnerabilities. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

24 articles

AI × CryptoNeutralCrypto Briefing · Jun 237/10

🤖

Visa showcases Project Glasswing’s findings on AI security risks at VB Transform 2026

Visa presented findings from Project Glasswing at VB Transform 2026, highlighting critical security vulnerabilities associated with AI systems. The research underscores AI's dual nature as both a defensive tool and a potential threat, driving urgent need for enhanced cybersecurity infrastructure and strategies.

GeneralBearishDaily Hodl · Jun 107/10

📰

Meta Discloses Instagram Data Breach As Cyberthieves Access up to 20,225 Accounts – Contact Info and Messages at Risk

Meta disclosed a data breach affecting up to 20,225 Instagram accounts in April 2026, with cybercriminals exploiting the platform's "High Touch Support" AI-assisted account recovery system. Compromised data includes contact information and private messages, raising significant concerns about user privacy and the security vulnerabilities of AI-powered authentication systems.

AIBearisharXiv – CS AI · Jun 107/10

🧠

Assessing Automated Prompt Injection Attacks in Agentic Environments

Researchers have evaluated automated prompt injection attacks against large language model agents using both white-box and black-box optimization methods, finding that black-box approaches significantly outperform gradient-based techniques in realistic agentic settings. While task-universal attacks transfer effectively across domains, attacks trained on smaller models fail to generalize to frontier models like GPT-5, suggesting model-dependent vulnerabilities rather than universal exploits.

🧠 GPT-5

AIBearisharXiv – CS AI · Jun 107/10

🧠

BadRobot: Jailbreaking Embodied LLM Agents in the Physical World

Researchers introduce BadRobot, an attack paradigm that exploits vulnerabilities in embodied LLM agents to make them perform harmful physical actions through voice commands. The study demonstrates successful attacks against prominent frameworks like Voxposer and Code as Policies, revealing critical safety gaps in AI systems integrated into physical robotics.

AIBearisharXiv – CS AI · Jun 107/10

🧠

Dynamics of Adversarial Attacks on Large Language Model-Based Search Engines

Researchers demonstrate that LLM-based search engines are vulnerable to ranking manipulation attacks, where adversaries craft content to game results. Using game theory, the study reveals that reducing attack success rates can paradoxically incentivize attacks, and defensive caps may fail—highlighting the need for adaptive security strategies beyond traditional defenses.

AIBearisharXiv – CS AI · Jun 97/10

🧠

Pretrained, Frozen, Still Leaking: Auditing Cross-Encoder Attribute Transfer in EEG Foundation Models

Researchers demonstrate that popular EEG foundation models (BIOT, LaBraM, EEGPT) leak sensitive neurological attributes despite appearing secure under individual audits. A cross-encoder transfer attack shows that attribute decoders trained on one frozen model successfully transfer to others, indicating shared vulnerabilities that standard defenses like differential privacy fail to adequately address.

AIBullishFortune Crypto · Jun 87/10

🧠

Why Lightspeed and Wiz’s Assaf Rappaport bet $37 million on an AI-powered cyberattacker

Assaf Rappaport, co-founder of cloud security unicorn Wiz, is leading a $37 million investment into an AI-powered cybersecurity startup designed to autonomously defend against AI-native attackers. The move reflects growing industry recognition that frontier AI models are exposing thousands of previously unknown vulnerabilities, necessitating next-generation defensive capabilities.

AIBearisharXiv – CS AI · Jun 57/10

🧠

SlotGCG: Exploiting the Positional Vulnerability in LLMs for Jailbreak Attacks

Researchers introduce SlotGCG, a novel jailbreak attack method that exploits positional vulnerabilities in large language models by strategically inserting adversarial tokens at optimal positions within prompts rather than just at the end. The approach achieves 14% higher success rates than existing GCG-based attacks while identifying that LLM vulnerability is significantly dependent on token insertion location.

AIBearisharXiv – CS AI · Jun 47/10

🧠

Caught in the Act(ivation): Toward Pre-Output and Multi-Turn Detection of Credential Exfiltration by LLM Agents

Researchers demonstrate that LLM agents are vulnerable to credential exfiltration attacks when sensitive data shares context windows with untrusted content, enabling indirect prompt injection. The study proposes three defense mechanisms: activation probes for pre-output detection, honeytokens with calibrated thresholds, and multi-turn leakage accounting to prevent cumulative credential theft across conversations.

AIBearisharXiv – CS AI · May 297/10

🧠

Hijacking Agent Memory: Stealthy Trojan Attacks Through Conversational Interaction

Researchers present MemPoison, a novel attack that exploits vulnerabilities in large language model agents by injecting malicious information into their long-term memory through dialogue interactions. The attack achieves up to 95% success rates by using semantic bridges, entity masquerading, and embedding optimization to bypass modern selective memory mechanisms, revealing critical security gaps in autonomous AI systems.

AIBearisharXiv – CS AI · May 297/10

🧠

Jailbreaking and Mitigation of Vulnerabilities in Large Language Models

A comprehensive arXiv research review examines vulnerabilities in Large Language Models, particularly prompt injection and jailbreaking attacks, while analyzing existing defense mechanisms. The study identifies critical security gaps and proposes future research directions for safer LLM deployment across applications.

AIBearisharXiv – CS AI · May 277/10

🧠

Cordyceps: Covert Control Attacks on LLMs via Data Poisoning

Researchers have identified a new data poisoning vulnerability in large language models called 'covert control attacks' that uses semantic associations to hide malicious instructions rather than obvious trigger phrases. This method successfully evades existing backdoor and prompt injection defenses, maintaining up to 98% attack success rates and outperforming traditional poisoning techniques by 40%.

AINeutralarXiv – CS AI · May 127/10

🧠

How LLMs Are Persuaded: A Few Attention Heads, Rerouted

Researchers have identified a compact causal mechanism explaining how large language models can be persuaded to abandon factual knowledge through the manipulation of mid-layer attention heads. The vulnerability operates as a discrete latent switch rather than confidence reduction, with persuasion working by redirecting attention via a rank-one feature built from persuasive keywords, revealing persuasion as a narrow and potentially monitorable circuit.

AIBearisharXiv – CS AI · May 47/10

🧠

Exploring LLM biases to manipulate AI search overview

Researchers demonstrate that Large Language Models used in AI search overview systems are vulnerable to bias manipulation through reinforcement learning-optimized snippet rewriting. The study reveals that adversaries can exploit LLM biases to influence search result rankings and generate inaccurate or harmful information, posing significant security risks to AI-powered search applications.

AIBearisharXiv – CS AI · Mar 277/10

🧠

The System Prompt Is the Attack Surface: How LLM Agent Configuration Shapes Security and Creates Exploitable Vulnerabilities

Research reveals that LLM system prompt configuration creates massive security vulnerabilities, with the same model's phishing detection rates ranging from 1% to 97% based solely on prompt design. The study PhishNChips demonstrates that more specific prompts can paradoxically weaken AI security by replacing robust multi-signal reasoning with exploitable single-signal dependencies.

AINeutralOpenAI News · Mar 257/10

🧠

Introducing the OpenAI Safety Bug Bounty program

OpenAI has launched a Safety Bug Bounty program designed to identify and address AI safety risks and potential abuse vectors. The program specifically targets vulnerabilities including agentic risks, prompt injection attacks, and data exfiltration threats.

🏢 OpenAI

AIBearisharXiv – CS AI · Mar 167/10

🧠

MalURLBench: A Benchmark Evaluating Agents' Vulnerabilities When Processing Web URLs

Researchers have released MalURLBench, the first benchmark to evaluate how LLM-based web agents handle malicious URLs, revealing significant vulnerabilities across 12 popular models. The study found that existing AI agents struggle to detect disguised malicious URLs and proposed URLGuard as a defensive solution.

AIBearisharXiv – CS AI · Mar 117/10

🧠

NetDiffuser: Deceiving DNN-Based Network Attack Detection Systems with Diffusion-Generated Adversarial Traffic

Researchers developed NetDiffuser, a framework that uses diffusion models to generate natural adversarial examples capable of deceiving AI-based network intrusion detection systems. The system achieved up to 29.93% higher attack success rates compared to baseline attacks, highlighting significant vulnerabilities in current deep learning-based security systems.

AIBearisharXiv – CS AI · Mar 37/103

🧠

Untargeted Jailbreak Attack

Researchers have developed a new 'untargeted jailbreak attack' (UJA) that can compromise AI safety systems in large language models with over 80% success rate using only 100 optimization iterations. This gradient-based attack method expands the search space by maximizing unsafety probability without fixed target responses, outperforming existing attacks by over 30%.

AIBearishThe Register – AI · Mar 256/10

🧠

AI supply chain attacks don’t even require malware…just post poisoned documentation

The article title suggests a new type of AI supply chain attack that doesn't require traditional malware, instead using poisoned documentation as the attack vector. However, no article body content was provided for analysis.

AIBearishThe Register – AI · Mar 47/10

🧠

AI doctor's assistant is easily swayed to change prescriptions, give bad medical advice

Research reveals that AI-powered medical assistant systems can be easily manipulated to change prescriptions and provide harmful medical advice. The study highlights significant vulnerabilities in AI healthcare tools that could pose serious risks to patient safety.

AIBearisharXiv – CS AI · Mar 37/107

🧠

CaptionFool: Universal Image Captioning Model Attacks

Researchers have developed CaptionFool, a universal adversarial attack that can manipulate AI image captioning models by modifying just 1.2% of image patches. The attack achieves 94-96% success rates in forcing models to generate arbitrary captions, including offensive content that can bypass content moderation systems.

AIBearishIEEE Spectrum – AI · Jan 216/105

🧠

Why AI Keeps Falling for Prompt Injection Attacks

Large language models (LLMs) remain highly vulnerable to prompt injection attacks where specific phrasing can override safety guardrails, causing AI systems to perform forbidden actions or reveal sensitive information. Unlike humans who use contextual judgment and layered defenses, current LLMs lack the ability to assess situational appropriateness and cannot universally prevent such attacks.

AIBearishOpenAI News · Feb 246/105

🧠

Attacking machine learning with adversarial examples

Adversarial examples are specially crafted inputs designed to fool machine learning models into making incorrect predictions, functioning like optical illusions for AI systems. The article explores how these attacks work across different mediums and highlights the challenges in defending ML systems against such vulnerabilities.