Analytics Digests Sources Topics RSS AI Crypto

#llm-security News & Analysis

177 articles tagged with #llm-security. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

177 articles

AIBearisharXiv – CS AI · Jun 257/10

🧠

Helpful or Harmful? Evaluating LLM-Assisted Vulnerability Patching via a Human Study

Researchers conducted a human study evaluating whether Large Language Model-assisted tools improve software vulnerability patching compared to manual debugging. The study revealed that while LLMs accelerate patching speed, they risk introducing insecure code and superficial repairs that pass functional tests but fail security validation, highlighting critical trade-offs in AI-assisted security workflows.

AIBearisharXiv – CS AI · Jun 257/10

🧠

What Does It Mean to Break a Distillation Defense?

Researchers propose a formal threat model framework for evaluating distillation defenses against black-box LLM attacks, arguing that existing output perturbation defenses lack clear specifications about attacker capabilities. The work demonstrates that defense effectiveness depends heavily on assumed threat parameters, raising concerns about false security claims in deployed systems.

AIBearisharXiv – CS AI · Jun 237/10

🧠

Safety in Self-Evolving LLM Agent Systems: Threats, Amplification, and Case Studies

A new security analysis reveals that self-evolving LLM agent systems face critical vulnerabilities across 17 of 25 potential attack vectors, with adversarial compromises becoming permanently encoded and self-amplifying across system generations. Testing of open-source frameworks demonstrates 100% attack persistence rates, suggesting that autonomous AI systems capable of self-modification require fundamentally new security paradigms beyond traditional static defenses.

AIBearisharXiv – CS AI · Jun 237/10

🧠

Safe to Check, Unsafe to Use: Relinking at the Compression Boundary of LLM Agents

Researchers have identified a critical vulnerability called "relinking" in LLM agents that use compression to handle long contexts. By splitting malicious instructions into benign fragments distributed across text, attackers can bypass security filters that inspect uncompressed prompts, as the compression process reconstructs the complete malicious instruction. Existing defenses fail to catch this attack, though a new KBRA defense eliminates the risk.

AIBearisharXiv – CS AI · Jun 237/10

🧠

Exposing the Illusion of Erasure in Knowledge Editing for LLMs

A new research paper reveals critical vulnerabilities in Knowledge Editing (KE) techniques used to update facts in Large Language Models without retraining. The study demonstrates that edited knowledge is not truly erased but merely suppressed, and can be recovered through adversarial prompting, exposing fundamental flaws in current post-hoc update methods.

AIBearisharXiv – CS AI · Jun 237/10

🧠

When Compression Becomes an Attack Surface: Black-Box Attacks on Prompt-Compressed LLM Agents

Researchers demonstrate that prompt compression—a technique used to reduce LLM latency and costs—creates a new security vulnerability when processing mixed trusted and untrusted inputs. By strategically perturbing untrusted data before compression, attackers can force compressors to discard critical task information or safety guardrails, achieving 71% attack success rates through a black-box method called COMA.

AIBearisharXiv – CS AI · Jun 237/10

🧠

HardSecBench: Benchmarking the Security Awareness of LLMs for Hardware Code Generation

Researchers introduced HardSecBench, a comprehensive security benchmark for evaluating large language models used in hardware and firmware code generation. The study of 924 tasks reveals that LLMs frequently produce functionally correct code while embedding critical security vulnerabilities, highlighting a significant gap in current AI safety evaluation practices.

AIBearisharXiv – CS AI · Jun 237/10

🧠

Confidently Wrong: Severity-Aware Calibration of Prompt-Injection Detectors under Attack Shift

Researchers discovered that popular prompt-injection detectors (ProtectAI-v2 and Prompt-Guard-2) maintain extremely high confidence scores even when failing to catch attacks, particularly indirect behavior-hijack injections. Across multiple attack distribution shifts, detectors missed injections with 0.99-1.00 confidence while false-negative rates ranged from 1-97%, indicating a critical calibration failure that standard metrics fail to detect.

AIBearisharXiv – CS AI · Jun 237/10

🧠

Leveraging Large Language Models to Obscure Code Stylometry: A Comparative Study of GPT-3.5 and GPT-4

Researchers demonstrate that Large Language Models like GPT-3.5 and GPT-4 can effectively obscure programmer code stylometry while maintaining functionality, challenging the reliability of authorship attribution techniques used in cybersecurity. The study reveals that structured, multi-shot prompting strategies outperform single-shot approaches in evading detection by traditional machine learning classifiers.

🧠 GPT-4

AIBearisharXiv – CS AI · Jun 237/10

🧠

TrojanGYM: A Detector-in-the-Loop LLM for Adaptive RTL Hardware Trojan Insertion

Researchers introduce TrojanGYM, an LLM-driven framework that automatically generates hardware Trojans to expose vulnerabilities in detection systems. The system demonstrates that existing detectors can be evaded at rates up to 83.33%, revealing critical gaps in hardware security testing methodologies.

🧠 GPT-4🧠 Gemini

AIBearisharXiv – CS AI · Jun 197/10

🧠

Analyzing the Narration Gap in LLM-Solver Loops

Researchers identify critical vulnerabilities in LLM-solver hybrid systems where formal verification guarantees break down during the narration phase—converting solver outputs to user-readable answers. Testing five open-source models reveals adversaries can manipulate final responses through prompt injection despite underlying formal correctness, indicating safety-critical applications using AI-assisted reasoning require additional safeguards beyond solver verification.

AI × CryptoNeutralarXiv – CS AI · Jun 197/10

🤖

Secure Coding Drift in LLM-Assisted Post-Quantum Cryptography Development: A Gamified Fix

Researchers identify 'Secure Coding Drift,' a vulnerability where developers gradually adopt insecure practices when relying on LLM-generated code for post-quantum cryptography implementation. The paper proposes a gamified framework that transforms LLMs into active security partners through adversarial evaluation and behavioral feedback to mitigate this socio-technical risk.

AIBullisharXiv – CS AI · Jun 197/10

🧠

From Construction to Injection: Edit-Based Fingerprints for Large Language Models

Researchers propose a novel fingerprinting framework for large language models that combines Code-mixing Fingerprints (CF) and Multi-Candidate Editing (MCEdit) to protect against unauthorized redistribution and commercial misuse. The approach addresses key vulnerabilities in existing fingerprinting methods by balancing imperceptibility with robustness against defensive filtering and downstream model modifications.

🏢 Perplexity

AIBearisharXiv – CS AI · Jun 197/10

🧠

Calibration Without Comprehension: Diagnosing the Limits of Fine-Tuning LLMs for Vulnerability Detection in Systems Software

A new research framework called CWE-Trace challenges the claim that large language models can reliably detect software vulnerabilities, revealing that fine-tuned models achieve only 52.1% accuracy at best and lack genuine security reasoning despite appearing well-calibrated. The study of 834 Linux kernel samples shows that models exhibit systematic failure patterns that persist across datasets and resist correction through fine-tuning, suggesting they memorize patterns rather than understand vulnerability detection.

AINeutralarXiv – CS AI · Jun 117/10

🧠

Risk Under Pressure: Compute-Aware Evaluation of Adversarial Robustness in Language Models

Researchers propose a compute-aware evaluation framework for assessing adversarial robustness in large language models, measuring attack effort in FLOPs rather than fixed query budgets. Testing across multiple models and attack strategies reveals that alignment training has non-monotonic effects on robustness, scaling reduces gradient-based attacks but not cheaper template-based ones, and safety measures leave certain harm categories disproportionately accessible.

AIBearisharXiv – CS AI · Jun 117/10

🧠

JailbreakOPT: Tool-Assisted Iterative Jailbreak Prompt Optimization

JailbreakOPT is a new framework that optimizes adversarial prompts to exploit safety vulnerabilities in large language models through iterative refinement and tool composition. The approach combines atomic jailbreak techniques with contextual bandits to achieve higher attack success rates while reducing the number of queries needed, demonstrating meaningful progress in LLM security testing.

AIBearisharXiv – CS AI · Jun 117/10

🧠

Learning to Inject: Automated Prompt Injection via Reinforcement Learning

Researchers developed AutoInject, a reinforcement learning framework that automatically generates adversarial prompts to exploit LLM agents through prompt injection attacks. The method outperforms existing attack techniques on production models and successfully breaks defenses specifically designed to resist prompt injection, highlighting a significant vulnerability gap in AI system security.

AIBearisharXiv – CS AI · Jun 117/10

🧠

Grammar-Constrained Decoding Can Jailbreak LLMs into Generating Malicious Code

Researchers have discovered that Grammar-Constrained Decoding (GCD), a technique used to improve code safety in Large Language Models, can actually be exploited as a jailbreak vector called CodeSpear. The study introduces CodeShield, a defensive alignment method that protects LLMs from generating malicious code even when attackers manipulate grammar constraints.

AIBearisharXiv – CS AI · Jun 107/10

🧠

Dynamics of Adversarial Attacks on Large Language Model-Based Search Engines

Researchers demonstrate that LLM-based search engines are vulnerable to ranking manipulation attacks, where adversaries craft content to game results. Using game theory, the study reveals that reducing attack success rates can paradoxically incentivize attacks, and defensive caps may fail—highlighting the need for adaptive security strategies beyond traditional defenses.

AIBearisharXiv – CS AI · Jun 107/10

🧠

CIAware-Bench: Benchmarking Control Intervention Awareness Across Frontier LLMs

Researchers introduce CIAware-Bench, a benchmark measuring whether frontier LLMs can detect when their outputs are being monitored and modified by AI control systems. Testing eleven models across multiple domains, the study finds low-to-moderate detection rates (up to 0.87 accuracy), revealing that intervention awareness varies significantly by task and model pair, with implications for the robustness of AI safety protocols.

AIBearisharXiv – CS AI · Jun 107/10

🧠

The Interlocutor Effect: Why LLMs Leak More Personal Data to Agents Than Humans

Researchers discovered that Large Language Models leak significantly more personally identifiable information (PII) when interacting with AI agents compared to human users, despite identical safety mechanisms. The study identifies an 'Interlocutor Effect' where LLMs reduce privacy caution based on perceived recipient identity, with leakage rates increasing up to 23 percentage points when addressing AI agents, raising critical security concerns for multi-agent system architectures.

🧠 Llama

AIBearisharXiv – CS AI · Jun 107/10

🧠

Can Multi-Agent LLMs Identify Their Peers? Stylometric Fingerprinting in Role-Constrained Political Analysis

Researchers demonstrate that multi-agent LLM systems used for political analysis can be identified by their stylometric fingerprints even when anonymized, undermining a proposed security mitigation. A fine-tuned T5 model achieved 99.1% accuracy in identifying LLM model families, revealing compliance gaps with EU AI Act requirements for transparency and system validation in critical applications.

🧠 Claude🧠 Sonnet🧠 Llama

AIBearisharXiv – CS AI · Jun 107/10

🧠

BadRobot: Jailbreaking Embodied LLM Agents in the Physical World

Researchers introduce BadRobot, an attack paradigm that exploits vulnerabilities in embodied LLM agents to make them perform harmful physical actions through voice commands. The study demonstrates successful attacks against prominent frameworks like Voxposer and Code as Policies, revealing critical safety gaps in AI systems integrated into physical robotics.

AIBearisharXiv – CS AI · Jun 107/10

🧠

Assessing Automated Prompt Injection Attacks in Agentic Environments

Researchers have evaluated automated prompt injection attacks against large language model agents using both white-box and black-box optimization methods, finding that black-box approaches significantly outperform gradient-based techniques in realistic agentic settings. While task-universal attacks transfer effectively across domains, attacks trained on smaller models fail to generalize to frontier models like GPT-5, suggesting model-dependent vulnerabilities rather than universal exploits.

🧠 GPT-5

AIBearisharXiv – CS AI · Jun 107/10

🧠

Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation

A comprehensive review of 247 research papers reveals that LLM agents face escalating security threats beyond text generation, including prompt injection, tool hijacking, and state corruption. The study proposes a framework emphasizing trust boundaries, privilege control, and stateful risk evaluation to address fragmented defenses and inadequate benchmarking standards.

Page 1 of 8Next →

Tag Connections

#geopolitical↔#iran

296

#iran↔#market

221

172

#geopolitical↔#market

145

142

#bitcoin↔#market

107

#fed↔#inflation

106

#iran↔#security

95

85

#market↔#trump

81

Tag Sentiment

#market1335 articles

#ai1011 articles

#iran859 articles

#geopolitical524 articles

#bitcoin404 articles

#trump322 articles

#security282 articles

#inflation235 articles

#fed205 articles

#trading202 articles

BullishNeutralBearish

◆ AI Mentions

🏢OpenAI

134×

🏢Anthropic

93×

🏢Nvidia

68×

🧠Claude

58×

🧠GPT-5

46×

🧠Gemini

37×

🧠ChatGPT

29×

🏢Meta

21×

🧠Grok

14×

🏢Google

13×

🧠GPT-4

12×

🏢Hugging Face

12×

🧠Opus

10×

🏢Perplexity

10×

🏢xAI

8×

🧠Llama

8×

🏢Microsoft

5×

🧠Sonnet

5×

🧠Copilot

2×

🧠Sora

1×

Stay Updated

Everything combined

▲ Trending Tags

1#market1335 2#ai1011 3#iran859 4#geopolitical524 5#bitcoin404 6#trump322 7#security282 8#inflation235 9#fed205 10#trading202 11#adoption163 12#stablecoin144 13#china141 14#openai133 15#institutional128

Filters

Sentiment

Importance

Sort

📡 See all 70+ sources

y0.exchange

Your AI agent for DeFi

Connect Claude or GPT to your wallet. AI reads balances, proposes swaps and bridges — you approve. Your keys never leave your device.

8 MCP tools · 15 chains · $0 fees

Connect Wallet to AI →How it works →

Viewing: y0 Digest feed