Min-$k$ Sampling: Decoupling Truncation from Temperature Scaling via Relative Logit Dynamics
Researchers propose Min-k Sampling, a novel decoding strategy for large language models that dynamically identifies semantic cliffs in logit distributions to optimize token truncation. Unlike temperature-sensitive methods like Top-k and Top-p, Min-k achieves temperature invariance through relative logit dynamics while maintaining superior text quality across reasoning, creative writing, and human evaluation benchmarks.
Min-k Sampling addresses a fundamental challenge in LLM decoding: balancing output diversity with quality while remaining robust to hyperparameter variations. Current industry-standard methods including Top-k, Top-p, and Min-p operate in probability space, requiring careful temperature tuning to prevent performance degradation. This sensitivity creates friction for practitioners who must recalibrate parameters across different use cases and model architectures.
The technical innovation centers on analyzing local geometric properties of sorted logit distributions rather than relying on global statistics. By detecting sharp transitions—semantic cliffs—between confident core tokens and uncertain long-tail tokens, Min-k dynamically adjusts truncation boundaries per generation step. This approach decouples temperature scaling from truncation logic, a constraint that plagued previous methods. The formal proof of strict temperature invariance provides theoretical grounding often absent in heuristic sampling strategies.
For LLM developers and deployed applications, this advance reduces hyperparameter engineering overhead while improving output quality consistency. Temperature invariance particularly benefits production systems serving heterogeneous use cases simultaneously without separate configuration pipelines. The empirical validation across reasoning benchmarks and creative writing demonstrates broad applicability beyond narrow task categories.
Public release of code and models enables rapid ecosystem adoption. As LLM inference becomes increasingly cost-competitive, decoding efficiency gains compound across billions of daily requests. The research establishes a paradigm shift from global statistical heuristics to local geometric analysis, likely influencing subsequent sampling strategy development. Organizations optimizing inference pipelines should evaluate Min-k integration, especially those currently constrained by temperature sensitivity in multi-purpose deployments.
- →Min-k Sampling achieves strict temperature invariance by analyzing local logit distribution geometry rather than relying on global statistics.
- →The method dynamically identifies semantic cliffs to optimize token truncation boundaries at each generation step without manual tuning.
- →Empirical results show consistent improvements in text quality across reasoning tasks and creative writing under extreme temperature settings.
- →Temperature invariance reduces hyperparameter engineering burden for production LLM systems serving multiple use cases.
- →Public release of code and models enables widespread adoption and potential paradigm shift in sampling strategy design.