AIBullisharXiv โ CS AI ยท 6h ago1
๐ง
Learning Structured Reasoning via Tractable Trajectory Control
Researchers propose Ctrl-R, a new framework that improves large language models' reasoning abilities by systematically discovering and reinforcing diverse reasoning patterns through structured trajectory control. The method enables better exploration of complex reasoning behaviors and shows consistent improvements across mathematical reasoning tasks in both language and vision-language models.