Sample Where You Struggle: Sharpening Base Model Reasoning via Entropy-Guided Power Sampling
Researchers introduce Entropy-Guided Power Sampling (EGPS), a novel training-free sampling method that accelerates reasoning in base language models by targeting high-entropy decision points rather than uniformly sampling across sequences. The technique achieves up to 12.6x speedup on mathematical and coding benchmarks while maintaining or improving accuracy, addressing fundamental inefficiencies in existing MCMC sampling approaches.
EGPS represents a significant methodological advance in extracting reasoning capabilities from base language models without requiring fine-tuning or external verifiers. The core insight—that power distribution sampling diverges from the base distribution primarily at sparse, high-entropy points—exposes a fundamental inefficiency in standard Metropolis-Hastings sampling, which wastes computational resources on near-deterministic token positions while under-mixing at critical decision boundaries. This structural mismatch has been a bottleneck in inference-time reasoning optimization.
The technique builds on established MCMC theory but applies it intelligently to language model inference. By leveraging entropy signals already computed during the forward pass, EGPS eliminates the overhead of traditional samplers while focusing computational effort where it matters most. This approach aligns with broader trends in AI optimization that emphasize adaptive compute allocation—spending resources where models face genuine uncertainty rather than distributing uniformly across all operations.
For practitioners deploying language models in reasoning-heavy domains, these results carry tangible implications. Achieving 75.8% accuracy on MATH500 and 62.2% on HumanEval at significantly reduced latency expands the practical feasibility of using smaller base models for complex tasks. Organizations currently relying on inference-time scaling or larger models for reasoning could potentially achieve comparable performance with lower computational costs. The training-free nature of EGPS makes it immediately applicable to existing deployed systems without retraining pipelines.
- →EGPS achieves up to 12.6x wall-clock speedup over standard MCMC sampling on mathematical reasoning benchmarks
- →The method targets sparse, high-entropy decision points rather than uniformly sampling across sequences, eliminating wasted computation
- →No training, fine-tuning, or external verifiers required—EGPS leverages entropy information already available in forward passes
- →Tested on Qwen2.5-Math-7B, reaching 75.8% on MATH500, 62.2% on HumanEval, and 42.4% on GPQA
- →Scales sampling cost with entropy mass rather than sequence length, making the approach increasingly efficient for longer generations