AIBullisharXiv – CS AI · 6h ago6/10
🧠
Beyond Entropy: Learning from Token-Level Distributional Deviations for LLM Reasoning
Researchers introduce the Independent Combinatorial Tokens (ICT) framework to improve Large Language Model reasoning by addressing entropy collapse and explosion problems in reinforcement learning. Using Jensen-Shannon divergence to identify critical token branching points, ICT achieves 4.58% average improvement in pass@4 scores across math, commonsense, and Olympiad benchmarks on Qwen models.