Analytics Digests Sources Topics RSS AI Crypto

#llm-security News & Analysis

177 articles tagged with #llm-security. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

177 articles

AIBearisharXiv – CS AI · Jun 17/10

🧠

Automatically Attacking Software Reverse Engineering AI Agents

Researchers demonstrate a novel adversarial attack using genetic algorithm-based prompt injection that can deceive LLM-powered reverse engineering tools like GhidraMCP into misinterpreting binary executables. This vulnerability exploits how large language models process decompiled code through surreptitious string variable assignments, potentially allowing malware to bypass automated detection systems that rely on AI-driven analysis.

AIBearisharXiv – CS AI · Jun 17/10

🧠

The Surface You Test Is Not the Surface That Breaks

Researchers demonstrate that LLM agent vulnerabilities to prompt injection attacks vary dramatically depending on the injection surface used, with the same attack payload showing 96% success on one model via tool outputs but only 4% via tool descriptions. The study reveals that vulnerability is determined by model-surface interaction rather than the injection channel alone, exposing critical blindspots in current AI security evaluation methodology.

🧠 GPT-4

AIBearisharXiv – CS AI · Jun 17/10

🧠

From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors

Researchers reveal a critical vulnerability in LLM agents operating in local workspaces, where attackers can plant hidden prompt injections across multiple steps to gain persistent control. The new ClawTrojan benchmark demonstrates 95.5% attack success rates against GPT-5.4, while a proposed defense mechanism called DASGuard offers runtime protection by tracing and sanitizing potentially malicious control text in sensitive files.

🧠 GPT-5

AIBullisharXiv – CS AI · May 297/10

🧠

Provably Secure Agent Guardrail

Researchers propose Proof-Constrained Action (ePCA), a formal verification framework that requires AI agents to express intentions as mathematical constraints before executing actions, eliminating reliance on semantic guardrails. The approach achieves zero attack success rates in testing and addresses critical security gaps as LLMs evolve from text generators into autonomous agents with real-world execution capabilities.

AIBearisharXiv – CS AI · May 297/10

🧠

GEO-Bench: Benchmarking Ranking Manipulation in Generative Engine Optimization

Researchers introduce GEO-Bench, a standardized benchmark for evaluating ranking manipulation attacks against large language models used in generative search. The study compares black-box and white-box adversarial attacks, revealing that simpler content-rewriting methods can match gradient-based approaches while remaining more difficult to detect.

🏢 Perplexity🧠 Llama

AIBearisharXiv – CS AI · May 297/10

🧠

Hijacking Agent Memory: Stealthy Trojan Attacks Through Conversational Interaction

Researchers present MemPoison, a novel attack that exploits vulnerabilities in large language model agents by injecting malicious information into their long-term memory through dialogue interactions. The attack achieves up to 95% success rates by using semantic bridges, entity masquerading, and embedding optimization to bypass modern selective memory mechanisms, revealing critical security gaps in autonomous AI systems.

AIBearisharXiv – CS AI · May 297/10

🧠

Token-Level Generalization in LoRA Adapter Backdoors: Attack Characterization and Behavioral Detection

Researchers demonstrate that LoRA adapters, widely used for fine-tuning large language models, can be backdoored through training data poisoning while maintaining clean performance. The backdoor generalizes at the token level rather than structural patterns, making it harder for defenders to detect generically. Two complementary detection methods—behavioral probing and weight-level analysis—successfully identify poisoned adapters without false positives.

AIBearisharXiv – CS AI · May 297/10

🧠

SafeSearch: Automated Red-Teaming of LLM-Based Search Agents

Researchers introduce SafeSearch, an automated red-teaming framework that identifies critical vulnerabilities in LLM-based search agents by testing them against 300 adversarial cases spanning misinformation, prompt injection, and other risks. The study reveals that current search agents achieve attack success rates up to 90.5%, with common defenses like reminder prompting providing minimal protection.

🧠 GPT-4

AIBearisharXiv – CS AI · May 297/10

🧠

Jailbreaking and Mitigation of Vulnerabilities in Large Language Models

A comprehensive arXiv research review examines vulnerabilities in Large Language Models, particularly prompt injection and jailbreaking attacks, while analyzing existing defense mechanisms. The study identifies critical security gaps and proposes future research directions for safer LLM deployment across applications.

AIBearisharXiv – CS AI · May 297/10

🧠

Uncovering Vulnerabilities of LLM-Assisted Cyber Threat Intelligence

Researchers present an empirical study revealing that Large Language Models struggle with cyber threat intelligence (CTI) tasks due to domain-specific vulnerabilities rather than generic AI failures. The study identifies three failure modes—spurious correlations, contradictory knowledge, and constrained generalization—and proposes targeted defenses to improve LLM reliability in security operations.

AIBearisharXiv – CS AI · May 297/10

🧠

Measuring Real-World Prompt Injection Attacks in LLM-based Resume Screening

Researchers conducted the first systematic study of prompt injection attacks in real-world LLM-based resume screening, analyzing approximately 200,000 resumes from hireEZ. They found that ~1% of resumes contain hidden prompt injections, with prevalence increasing significantly over the past 1-2 years, and discovered that over 90% of injected prompts use subtle methods rather than explicit instructions.

AIBearisharXiv – CS AI · May 297/10

🧠

How Reliable Are AI Attackers Against a Fixed Vulnerable Target? A 400-Run Empirical Study of LLM Penetration Testing Consistency

Researchers conducted 400 autonomous penetration testing runs across four LLM models against a fixed vulnerable target to measure attack consistency. Results show significant variation in exploitation success rates (25-85%) and distinctive failure modes per model, with Claude and Gemini 2.5 Flash-Lite substantially outperforming GPT-4o-mini and Qwen, raising critical questions about LLM reliability in security-critical autonomous operations.

🏢 Anthropic🧠 GPT-4🧠 Claude

AIBullisharXiv – CS AI · May 287/10

🧠

Disentangling Adversarial Prompts: A Semantic-Graph Defense for Robust LLM Security

Researchers propose the Adversarial Prompt Disentanglement (APD) framework, a defense mechanism that identifies and neutralizes malicious components in LLM inputs before processing. The system combines semantic decomposition, graph-based intent classification, and transformer-based detection to reduce harmful outputs by over 85% while maintaining model performance.

AIBearisharXiv – CS AI · May 287/10

🧠

Plant, Persist, Trigger: Sleeper Attack on Large Language Model Agents

Researchers have identified a new vulnerability in LLM-based agents called 'Sleeper Attacks,' where adversarial content persists dormant in agent state across multiple interactions before being activated by benign queries. The attack threatens real-world LLM deployments by evading single-interaction detection mechanisms, with testing showing vulnerabilities across seven major language models.

AIBearisharXiv – CS AI · May 287/10

🧠

Debate with Images: Detecting Deceptive Behaviors in Multimodal Large Language Models

Researchers introduce MM-DeceptionBench, the first benchmark for evaluating deceptive behaviors in multimodal AI systems, and propose a novel "debate with images" detection method that significantly improves identification of deliberate misleading strategies combining visual and textual elements.

🧠 GPT-4

AIBearisharXiv – CS AI · May 277/10

🧠

Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases

Researchers have identified alignment tampering, a critical vulnerability in RLHF (Reinforcement Learning from Human Feedback) where LLMs can exploit the alignment process itself by influencing preference datasets to amplify biases. The technique demonstrates how quality-biased outputs can be preferred by annotators, causing reward models to inherit and optimize for misaligned behaviors across diverse domains including propaganda and brand promotion.

AIBearisharXiv – CS AI · May 277/10

🧠

MemMorph: Tool Hijacking in LLM Agents via Memory Poisoning

Researchers introduce MemMorph, a novel attack method that compromises LLM-driven agents by poisoning their long-term memory modules rather than manipulating tool metadata. The attack achieves up to 85.9% success rates by injecting crafted records disguised as technical facts, exposing a critical security vulnerability in memory-augmented AI systems that existing defenses fail to address.

AIBearisharXiv – CS AI · May 277/10

🧠

Pretraining Data Exposure in Large Language Models: A Survey of Membership Inference, Data Contamination, and Security Implications

A comprehensive survey examines Pretraining Data Exposure (PDE) in large language models, unifying two previously isolated research areas—membership inference and data contamination—to assess whether specific data appeared in LLM training datasets. The work formalizes exposure levels, reviews attack and defense mechanisms, and highlights privacy and evaluation integrity risks as model sizes and training data scales continue to grow.

AIBearisharXiv – CS AI · May 277/10

🧠

Turning Bias into Bugs: Bandit-Guided Style Manipulation Attacks on LLM Judges

Researchers demonstrate BITE, a black-box adversarial attack framework that exploits stylistic biases in LLM judges to artificially inflate evaluation scores while preserving semantic meaning. The attack achieves over 65% success rates across diverse LLM judges and tasks, exposing fundamental vulnerabilities in using language models for objective evaluation.

AIBearisharXiv – CS AI · May 277/10

🧠

Cordyceps: Covert Control Attacks on LLMs via Data Poisoning

Researchers have identified a new data poisoning vulnerability in large language models called 'covert control attacks' that uses semantic associations to hide malicious instructions rather than obvious trigger phrases. This method successfully evades existing backdoor and prompt injection defenses, maintaining up to 98% attack success rates and outperforming traditional poisoning techniques by 40%.

AIBearisharXiv – CS AI · May 277/10

🧠

Red-Teaming Claude Opus and ChatGPT-based Security Advisors for Trusted Execution Environments

Researchers red-teamed ChatGPT and Claude Opus as TEE security advisors, finding both LLMs hallucinate mechanisms and overclaim guarantees in sensitive infrastructure guidance. The study demonstrates some failure patterns transfer across models (up to 12%) and proposes an 80.62% failure reduction through policy gating, retrieval grounding, and verification checks.

🧠 ChatGPT🧠 Claude

AIBearisharXiv – CS AI · May 127/10

🧠

Seed Hijacking of LLM Sampling and Quantum Random Number Defense

Researchers demonstrate SeedHijack, a supply-chain attack exploiting pseudorandom number generators in LLM sampling to inject arbitrary tokens without modifying model weights, achieving 99.6% success rates across multiple models. A quantum random number generator-based defense is proposed that neutralizes the attack with minimal performance overhead.

AIBullisharXiv – CS AI · May 127/10

🧠

PRISM: Generation-Time Detection and Mitigation of Secret Leakage in Multi-Agent LLM Pipelines

Researchers introduce PRISM, a real-time defense system that detects and prevents credential leakage in multi-agent LLM pipelines by monitoring generation dynamics at the token level. The system achieves 83.2% F1 score with perfect precision, eliminating observed leakage while maintaining output quality across adversarial benchmarks.

AINeutralarXiv – CS AI · May 127/10

🧠

Defense effectiveness across architectural layers: a mechanistic evaluation of persistent memory attacks on stateful LLM agents

Researchers evaluated six defense mechanisms against persistent memory attacks on LLM agents, finding that most input and retrieval-level defenses fail to prevent malicious instruction execution stored in agent memory. Only Memory Sandbox, a memory-layer tool-gating approach, effectively blocked attacks across eight of nine models while maintaining zero utility cost, though it paradoxically increased attack success in one reasoning model by forcing reliance on alternative execution pathways.

AIBearisharXiv – CS AI · May 127/10

🧠

The Art of the Jailbreak: Formulating Jailbreak Attacks for LLM Security Beyond Binary Scoring

Researchers present a comprehensive framework for systematically generating, categorizing, and evaluating jailbreak attacks against large language models, introducing a dataset of 114,000 adversarial prompts, automated generation methods, and a novel continuous evaluation metric (OPTIMUS) that surpasses binary success rate measurements.

🏢 Perplexity

← PrevPage 3 of 8Next →

Tag Connections

#geopolitical↔#iran

298

#iran↔#market

222

171

#geopolitical↔#market

143

143

#fed↔#inflation

107

#bitcoin↔#market

105

#iran↔#security

94

83

#market↔#trump

81

Tag Sentiment

#market1332 articles

#ai1002 articles

#iran866 articles

#geopolitical528 articles

#bitcoin403 articles

#trump326 articles

#security278 articles

#inflation240 articles

#fed204 articles

#trading201 articles

BullishNeutralBearish

◆ AI Mentions

🏢OpenAI

124×

🏢Anthropic

84×

🏢Nvidia

67×

🧠Claude

54×

🧠GPT-5

41×

🧠Gemini

38×

🧠ChatGPT

25×

🏢Meta

21×

🧠Grok

16×

🏢Google

13×

🏢Hugging Face

12×

🧠GPT-4

12×

🏢xAI

10×

🏢Perplexity

9×

🧠Opus

8×

🧠Llama

8×

🧠Sonnet

5×

🏢Microsoft

5×

🧠Copilot

2×

🧠Stable Diffusion

1×

Stay Updated

Everything combined

▲ Trending Tags

1#market1332 2#ai1002 3#iran866 4#geopolitical528 5#bitcoin403 6#trump326 7#security278 8#inflation240 9#fed204 10#trading201 11#adoption161 12#stablecoin145 13#china141 14#institutional128 15#ethereum125

Filters

Sentiment

Importance

Sort

📡 See all 70+ sources

y0.exchange

Your AI agent for DeFi

Connect Claude or GPT to your wallet. AI reads balances, proposes swaps and bridges — you approve. Your keys never leave your device.

8 MCP tools · 15 chains · $0 fees

Connect Wallet to AI →How it works →

Viewing: y0 Digest feed