PRISM: Generation-Time Detection and Mitigation of Secret Leakage in Multi-Agent LLM Pipelines
Researchers introduce PRISM, a real-time defense system that detects and prevents credential leakage in multi-agent LLM pipelines by monitoring generation dynamics at the token level. The system achieves 83.2% F1 score with perfect precision, eliminating observed leakage while maintaining output quality across adversarial benchmarks.
PRISM addresses a critical vulnerability in enterprise AI deployments where sensitive information flows through multi-agent systems. When one LLM agent accesses confidential data, downstream agents can inadvertently reproduce secrets in their outputs through shared context, creating a cascading risk that existing defenses fail to mitigate. Traditional safeguards operate too late in the pipeline, relying on pattern matching after generation completes, introducing latency and blind spots.
The innovation lies in PRISM's temporal approach, analyzing generation dynamics during token-by-token decoding rather than post-hoc filtering. By combining 16 signals—including entropy metrics, logit concentration, and identifier patterns—the system detects the measurable shift in generation behavior that precedes credential reproduction. This enables intervention before secrets fully reconstruct, treating credential leakage as a sequential risk accumulation problem rather than a static pattern-matching challenge.
For organizations deploying autonomous agent systems, PRISM represents a significant advancement in securing sensitive workflows. The benchmark results demonstrate practical feasibility: zero observed leakage across 2,000 adversarial tasks while preserving 89.3% output utility, substantially outperforming baseline defenses. The perfect precision rate prevents false positives that could disrupt legitimate operations.
Deployment implications extend beyond security to operational efficiency. Real-time intervention at the token level eliminates post-generation latency penalties, enabling PRISM integration into production pipelines without performance degradation. As multi-agent LLM systems proliferate in enterprise environments handling financial data, healthcare records, and proprietary information, such runtime defenses become essential infrastructure rather than optional features.
- →PRISM detects credential leakage during LLM generation by monitoring entropy collapse and logit concentration rather than post-hoc pattern matching.
- →Achieves 83.2% F1 score with 100% precision and zero observed leakage across 2,000 adversarial tasks in multi-agent pipelines.
- →Combines 16 contextual, behavioral, and information-theoretic signals into calibrated risk zones enabling per-token intervention.
- →Preserves 89.3% output utility while preventing secret reconstruction, outperforming existing defenses by 11.3% F1 improvement.
- →Real-time token-level approach eliminates post-generation latency, enabling production deployment in enterprise AI systems handling sensitive data.