#ai-vulnerability News & Analysis

12 articles tagged with #ai-vulnerability. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

12 articles

AIBearisharXiv – CS AI · 2d ago7/10

🧠

Dive into Ambiguity: A*-Inspired Multi-Agents Commonsense Obfuscation Attack on LLM Prompts

Researchers have developed an A*-inspired framework that generates obfuscated prompts capable of triggering factual errors in large language models while preserving semantic intent. The method uses a hierarchical rewrite strategy with dynamic semantic dispersion to efficiently create adversarial prompts, demonstrating higher attack success rates than existing approaches and raising urgent concerns about LLM reliability in safety-critical applications.

AIBearishSimon Willison Blog · 2d ago7/10

🧠

Hackers Simply Asked Meta AI to Give Them Access to High-Profile Instagram Accounts. It Worked

Hackers exploited Meta's AI systems to gain unauthorized access to high-profile Instagram accounts by simply requesting assistance from the company's AI tools. The vulnerability reveals critical security gaps in AI-powered authentication systems and raises concerns about how generative AI can be weaponized to bypass account security measures.

🏢 Meta

AIBearisharXiv – CS AI · 3d ago7/10

🧠

The Surface You Test Is Not the Surface That Breaks

Researchers demonstrate that LLM agent vulnerabilities to prompt injection attacks vary dramatically depending on the injection surface used, with the same attack payload showing 96% success on one model via tool outputs but only 4% via tool descriptions. The study reveals that vulnerability is determined by model-surface interaction rather than the injection channel alone, exposing critical blindspots in current AI security evaluation methodology.

🧠 GPT-4

AIBearisharXiv – CS AI · 3d ago7/10

🧠

Automatically Attacking Software Reverse Engineering AI Agents

Researchers demonstrate a novel adversarial attack using genetic algorithm-based prompt injection that can deceive LLM-powered reverse engineering tools like GhidraMCP into misinterpreting binary executables. This vulnerability exploits how large language models process decompiled code through surreptitious string variable assignments, potentially allowing malware to bypass automated detection systems that rely on AI-driven analysis.

AIBearishDaily Hodl · 4d ago7/10

🧠

Pennsylvania Bank Issues Urgent Alert After AI Application Triggers Data Breach, Exposing Sensitive Customer Info

Community Bank, a Pennsylvania-based financial institution, disclosed a data breach caused by an AI application that exposed customer names, social security numbers, and dates of birth. The breach, reported to the SEC, highlights emerging cybersecurity vulnerabilities in AI-powered banking systems and raises concerns about enterprise AI security practices across the financial sector.

AIBearisharXiv – CS AI · 6d ago7/10

🧠

Measuring Real-World Prompt Injection Attacks in LLM-based Resume Screening

Researchers conducted the first systematic study of prompt injection attacks in real-world LLM-based resume screening, analyzing approximately 200,000 resumes from hireEZ. They found that ~1% of resumes contain hidden prompt injections, with prevalence increasing significantly over the past 1-2 years, and discovered that over 90% of injected prompts use subtle methods rather than explicit instructions.

AIBearisharXiv – CS AI · Apr 107/10

🧠

BadImplant: Injection-based Multi-Targeted Graph Backdoor Attack

Researchers have demonstrated the first multi-targeted backdoor attack against graph neural networks (GNNs) in graph classification tasks, using a novel subgraph injection method that simultaneously redirects multiple predictions to different target labels while maintaining clean accuracy. The attack shows high efficacy across multiple GNN architectures and datasets, with resilience against existing defense mechanisms, exposing significant vulnerabilities in GNN security.

AIBearisharXiv – CS AI · Mar 177/10

🧠

DECEIVE-AFC: Adversarial Claim Attacks against Search-Enabled LLM-based Fact-Checking Systems

Researchers developed DECEIVE-AFC, an adversarial attack framework that can significantly compromise AI-based fact-checking systems by manipulating claims to disrupt evidence retrieval and reasoning. The attacks reduced fact-checking accuracy from 78.7% to 53.7% in testing, highlighting major vulnerabilities in LLM-based verification systems.

AIBearisharXiv – CS AI · Mar 117/10

🧠

When Robots Obey the Patch: Universal Transferable Patch Attacks on Vision-Language-Action Models

Researchers have developed UPA-RFAS, a new adversarial attack framework that can successfully fool Vision-Language-Action (VLA) models used in robotics with universal physical patches that transfer across different models and real-world scenarios. The attack exploits vulnerabilities in AI-powered robots by using patches that can hijack attention mechanisms and cause semantic misalignment between visual and text inputs.

AIBearisharXiv – CS AI · Mar 67/10

🧠

Induced Numerical Instability: Hidden Costs in Multimodal Large Language Models

Researchers discovered a new vulnerability in multimodal large language models where specially crafted images can cause significant performance degradation by inducing numerical instability during inference. The attack method was validated on major vision-language models including LLaVa, Idefics3, and SmolVLM, showing substantial performance drops even with minimal image modifications.

AIBearisharXiv – CS AI · Mar 57/10

🧠

Image-based Prompt Injection: Hijacking Multimodal LLMs through Visually Embedded Adversarial Instructions

Researchers have developed Image-based Prompt Injection (IPI), a black-box attack that embeds adversarial instructions into natural images to manipulate multimodal AI models. Testing on GPT-4-turbo achieved up to 64% attack success rate, demonstrating a significant security vulnerability in vision-language AI systems.

🧠 GPT-4

AIBearisharXiv – CS AI · Mar 37/106

🧠

Learning to Attack: A Bandit Approach to Adversarial Context Poisoning

Researchers developed AdvBandit, a new black-box adversarial attack method that can exploit neural contextual bandits by poisoning context data without requiring access to internal model parameters. The attack uses bandit theory and inverse reinforcement learning to adaptively learn victim policies and optimize perturbations, achieving higher victim regret than existing methods.