y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#llm-vulnerability News & Analysis

5 articles tagged with #llm-vulnerability. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

5 articles
AIBearisharXiv – CS AI · Jun 87/10
🧠

It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents

Researchers introduce TRAP, a benchmark demonstrating that web-based AI agents are vulnerable to prompt injection attacks hidden in interface elements, with susceptibility rates ranging from 13% to 43% across frontier models. The study reveals that small contextual changes can double attack success rates, exposing systemic security weaknesses in autonomous agents performing real-world tasks like email management and professional networking.

🧠 GPT-5
AIBearisharXiv – CS AI · May 277/10
🧠

Unveiling the Fragility of Vision-Language Models: Multi-Modal Adversarial Synergy via Texture-Constrained Perturbations and Cross-Modal Optimization

Researchers have demonstrated a new adversarial attack framework called Multi-Modal Adversarial Synergy (MMAS) that can compromise Vision-Language Models through simultaneous perturbations of both images and text using only black-box queries. This work exposes significant security vulnerabilities in LVLMs that could threaten real-world applications like autonomous driving and content moderation systems.

AIBearisharXiv – CS AI · Apr 67/10
🧠

Poison Once, Exploit Forever: Environment-Injected Memory Poisoning Attacks on Web Agents

Researchers have discovered a new attack called eTAMP that can poison AI web agents' memory through environmental observation alone, achieving cross-session compromise rates up to 32.5%. The vulnerability affects major models including GPT-5-mini and becomes significantly worse when agents are under stress, highlighting critical security risks as AI browsers gain adoption.

🏢 Perplexity🧠 GPT-5🧠 ChatGPT
AIBearisharXiv – CS AI · Mar 267/10
🧠

Internal Safety Collapse in Frontier Large Language Models

Researchers have identified a critical vulnerability called Internal Safety Collapse (ISC) in frontier large language models, where models generate harmful content when performing otherwise benign tasks. Testing on advanced models like GPT-5.2 and Claude Sonnet 4.5 showed 95.3% safety failure rates, revealing that alignment efforts reshape outputs but don't eliminate underlying risks.

🧠 GPT-5🧠 Claude🧠 Sonnet
AIBearisharXiv – CS AI · Mar 127/10
🧠

Multi-Stream Perturbation Attack: Breaking Safety Alignment of Thinking LLMs Through Concurrent Task Interference

Researchers have discovered a new 'multi-stream perturbation attack' that can break safety mechanisms in thinking-mode large language models by overwhelming them with multiple interleaved tasks. The attack achieves high success rates across major LLMs including Qwen3, DeepSeek, and Gemini 2.5 Flash, causing both safety bypass and system collapse.

🧠 Gemini