Researchers demonstrate that token ranking signatures from language model APIs are mathematically unforgeable—each model produces unique top-k token orderings that cannot be replicated by other models. While rankings leak less information than raw logits, they still enable approximate parameter theft, though APIs can mitigate this risk by restricting k to sufficiently small values.
This research addresses a critical security gap in language model API design. As organizations deploy LLMs through APIs that expose probability distributions, attackers have exploited this information leakage to extract model parameters. The study reveals that even more restrictive APIs—ones that only reveal token rankings rather than probability scores—still create exploitable signatures, though with important cryptographic properties.
The unforgeable signature property is mathematically significant because finding a model with identical feasible rankings is NP-hard, theoretically preventing attackers from forging model identities through brute force. This contrasts with logit-based signatures, which can be more easily replicated. However, the practical security benefit remains limited: attackers can still approximate final layer parameters from ranking information alone, just with reduced precision.
For the AI industry, this research highlights the fundamental tension between model verification and parameter protection. Developers building on top of LLM APIs often need assurance they're querying the claimed model, not a substitute. The paper's key insight—that APIs can present unforgeable signatures while preventing parameter theft by using appropriately restricted k values—offers a practical middle ground. The specific threshold where signatures become unforgeable while remaining parameter-safe differs per model and API design.
Looking ahead, this work will likely influence how AI companies design API output formats and security policies. As model theft continues as a threat vector, understanding which information can be safely exposed becomes crucial for competitive moats and intellectual property protection in the AI sector.
- →Token rankings create unique, mathematically unforgeable model signatures that cannot be replicated through NP-hard computation
- →Ranking-based APIs leak less sensitive information than logit APIs but still enable approximate parameter extraction
- →APIs can achieve both model verification and parameter protection by restricting k to threshold values below those needed for signature spoofing
- →The research identifies a previously unknown cryptographic property in language model outputs with security implications
- →Model IP theft remains possible through ranking information alone, requiring careful API design to balance transparency and security