y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#ai-vulnerabilities News & Analysis

10 articles tagged with #ai-vulnerabilities. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

10 articles
AIBearisharXiv – CS AI · Mar 277/10
🧠

The System Prompt Is the Attack Surface: How LLM Agent Configuration Shapes Security and Creates Exploitable Vulnerabilities

Research reveals that LLM system prompt configuration creates massive security vulnerabilities, with the same model's phishing detection rates ranging from 1% to 97% based solely on prompt design. The study PhishNChips demonstrates that more specific prompts can paradoxically weaken AI security by replacing robust multi-signal reasoning with exploitable single-signal dependencies.

AINeutralOpenAI News · Mar 257/10
🧠

Introducing the OpenAI Safety Bug Bounty program

OpenAI has launched a Safety Bug Bounty program designed to identify and address AI safety risks and potential abuse vectors. The program specifically targets vulnerabilities including agentic risks, prompt injection attacks, and data exfiltration threats.

🏢 OpenAI
AIBearisharXiv – CS AI · Mar 167/10
🧠

MalURLBench: A Benchmark Evaluating Agents' Vulnerabilities When Processing Web URLs

Researchers have released MalURLBench, the first benchmark to evaluate how LLM-based web agents handle malicious URLs, revealing significant vulnerabilities across 12 popular models. The study found that existing AI agents struggle to detect disguised malicious URLs and proposed URLGuard as a defensive solution.

AIBearisharXiv – CS AI · Mar 117/10
🧠

NetDiffuser: Deceiving DNN-Based Network Attack Detection Systems with Diffusion-Generated Adversarial Traffic

Researchers developed NetDiffuser, a framework that uses diffusion models to generate natural adversarial examples capable of deceiving AI-based network intrusion detection systems. The system achieved up to 29.93% higher attack success rates compared to baseline attacks, highlighting significant vulnerabilities in current deep learning-based security systems.

AIBearisharXiv – CS AI · Mar 37/103
🧠

Untargeted Jailbreak Attack

Researchers have developed a new 'untargeted jailbreak attack' (UJA) that can compromise AI safety systems in large language models with over 80% success rate using only 100 optimization iterations. This gradient-based attack method expands the search space by maximizing unsafety probability without fixed target responses, outperforming existing attacks by over 30%.

AIBearisharXiv – CS AI · Mar 37/107
🧠

CaptionFool: Universal Image Captioning Model Attacks

Researchers have developed CaptionFool, a universal adversarial attack that can manipulate AI image captioning models by modifying just 1.2% of image patches. The attack achieves 94-96% success rates in forcing models to generate arbitrary captions, including offensive content that can bypass content moderation systems.

AIBearishIEEE Spectrum – AI · Jan 216/105
🧠

Why AI Keeps Falling for Prompt Injection Attacks

Large language models (LLMs) remain highly vulnerable to prompt injection attacks where specific phrasing can override safety guardrails, causing AI systems to perform forbidden actions or reveal sensitive information. Unlike humans who use contextual judgment and layered defenses, current LLMs lack the ability to assess situational appropriateness and cannot universally prevent such attacks.

AIBearishOpenAI News · Feb 246/105
🧠

Attacking machine learning with adversarial examples

Adversarial examples are specially crafted inputs designed to fool machine learning models into making incorrect predictions, functioning like optical illusions for AI systems. The article explores how these attacks work across different mediums and highlights the challenges in defending ML systems against such vulnerabilities.