AIBullisharXiv – CS AI · 6h ago7/10
🧠
Emergent Slow Thinking in LLMs as Inverse Tree Freezing
Researchers present a statistical-physics framework explaining how large language models develop multi-step reasoning through reinforcement learning with verifiable rewards (RLVR), modeling the process as inverse tree freezing in a concept network. They propose Annealed-RLVR, a timing-optimized training method that outperforms standard RLVR by applying supervised fine-tuning at peak frustration rather than after convergence, preventing policy collapse.