Modeling Hierarchical Thinking in Large Reasoning Models
Researchers propose modeling Large Reasoning Models' Chain-of-Thought processes as trajectories through a six-state Finite State Machine, enabling better understanding and control of reasoning dynamics. They introduce Q-Value guided steering, a training-free method that optimizes reasoning by applying sparse activation steering at sentence boundaries, achieving significant performance gains across multiple benchmarks with minimal computational overhead.