y0news
← Feed
Back to feed
🧠 AI🔴 BearishImportance 7/10Actionable

Hiding in Plain Floats: Steganographic Carriers for Indirect Prompt and Content Injection

arXiv – CS AI|Mudit Sinha, Sanika Chavan|
🤖AI Summary

Researchers demonstrated a novel prompt-injection attack that bypasses text-based LLM defenses by encoding malicious payloads as floating-point parameters and reconstructing them as fragmented telemetry. Testing across three commercial LLM APIs showed 94.3% attack success rate against leading defenses like Prompt Guard 2, revealing a critical gap in structured-input security.

Analysis

This research exposes a fundamental vulnerability in how large language models validate inputs when structured data flows through their systems. The attack works by disguising malicious prompts as numeric parameters—data types that traditional text inspectors ignore—then reconstructing them inside the model's execution context. The approach succeeded against defenses specifically designed to catch prompt injections, demonstrating that security layers examining only visible text miss entire attack surfaces.

The vulnerability stems from architectural decisions in modern LLM deployments. Commercial APIs often accept structured inputs (JSON, arrays, floats) alongside text, then reconstruct or process these components in ways that downstream systems cannot fully inspect. This separation between input validation and internal reconstruction creates a blind spot. The researchers tested across 14,400 real-world trials spanning multiple providers, making this a reproducible, provider-agnostic failure mode rather than an isolated edge case.

For developers and organizations deploying LLM APIs, this research highlights the inadequacy of text-only defenses in production environments. Security teams cannot assume that validated text inputs represent the complete threat surface. The attack also challenges assumptions about defense layering—strong ensemble defenses failed because they all shared the same fundamental limitation: inspecting only human-readable text.

The mitigation path forward involves semantic validation of reconstructed data and detection mechanisms that operate across all input channels, not just text. The researchers note that simple detectors like xxd can partially block current variants, suggesting defenses are possible but require architectural changes. Organizations should audit their input pipelines to identify where structured data undergoes reconstruction without full validation.

Key Takeaways
  • Malicious prompts encoded as floating-point arrays bypass 94.3% of text-based LLM defenses tested across commercial APIs
  • Attack succeeds because defenses inspect only visible text while ignoring structured numeric inputs that are reconstructed internally
  • Prompt Guard 2 and ensemble classifiers fail because they share the same fundamental weakness: incomplete input-channel coverage
  • Mitigation requires validation at reconstruction layers and semantic inspection of all input types, not text-only filtering
  • Vulnerability affects production LLM pipelines from multiple providers, making it a systemic architecture problem rather than vendor-specific flaw
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles