AIBullisharXiv โ CS AI ยท 5h ago
๐ง
R1-Code-Interpreter: LLMs Reason with Code via Supervised and Multi-stage Reinforcement Learning
Researchers developed R1-Code-Interpreter, a large language model that uses multi-stage reinforcement learning to autonomously generate code for step-by-step reasoning across diverse tasks. The 14B parameter model achieves 72.4% accuracy on test tasks, outperforming GPT-4o variants and demonstrating emergent self-checking capabilities through code generation.
๐ข Hugging Face๐ง GPT-4