🧠 AI⚪ NeutralImportance 6/10

Hidden-State Privacy Has an Empty Middle

arXiv – CS AI|Alexander Okezue Bell|May 27, 2026 at 04:00 AM

🤖AI Summary

Researchers demonstrate that Gaussian mechanisms for hidden-state privacy face a fundamental trade-off, with no configurations achieving both moderate utility and moderate privacy against adaptive attackers. A diagonal inverse-Fisher mechanism emerges as minimax-optimal but sits at the privacy-utility boundary rather than within an achievable middle ground, suggesting future work must redesign architectures rather than optimize within existing Gaussian frameworks.

Analysis

This research addresses a critical gap in machine learning privacy, specifically how neural network hidden states can be released without exposing sensitive information to adaptive adversaries. The study evaluates 1,536 different Gaussian covariance configurations and finds none simultaneously satisfy practical privacy and utility requirements—an empirical result the authors support with theoretical Fisher-ball lower bounds proving this limitation is fundamental, not accidental.

The work builds on growing concerns about model inversion attacks, where adversaries reconstruct training data or sensitive intermediate representations from model outputs. Prior approaches assumed privacy-utility trade-offs could be navigated within Gaussian release mechanisms. This research proves otherwise: any mechanism maintaining constant Fisher utility necessarily creates directions where an attacker's Mahalanobis signal grows linearly with hidden state width, making uniform protection impossible.

The proposed diagonal inverse-Fisher mechanism ($\Sigma^\star_{\mathrm{diag}}$) achieves theoretical optimality but occupies the privacy-utility boundary rather than the interior, achieving near-perfect defense (top-1 error <0.001) only by sacrificing model utility. Transformer experiments reveal architectural co-design offers escape: split-memory transformers trained from scratch maintain 6-24x advantage over standard baselines while preserving competitive language modeling performance.

This reframes the problem fundamentally. Rather than tuning Gaussian parameters, practitioners must either accept the boundary trade-off or explore architectural modifications that inherently resist inversion attacks. The 94% to 0% collapse of sequence inversion attacks demonstrates $\Sigma_{\mathrm{diag}}$'s effectiveness, though at significant computational cost.

Key Takeaways

→No Gaussian mechanism achieves simultaneous moderate privacy and utility against adaptive attackers across 1,536 tested configurations
→Theoretical lower bounds prove the privacy-utility gap is fundamental, not empirically avoidable through parameter tuning
→Diagonal inverse-Fisher mechanism is minimax-optimal but exists at the privacy-utility boundary, requiring architectural co-design for practical deployment
→Split-memory transformers reach 6-24x privacy advantage over standard baselines while maintaining competitive language modeling performance
→Hidden-state privacy research must shift from mechanism design within Gaussians to architecture redesign fundamentals