🧠 AI🟢 BullishImportance 7/10

Reward Is Enough: LLMs Are In-Context Reinforcement Learners

arXiv – CS AI|Kefan Song, Amir Moeini, Peng Wang, Lei Gong, Rohan Chandra, Shangtong Zhang, Yanjun Qi|March 26, 2026 at 04:00 AM

🤖AI Summary

Researchers demonstrate that large language models can perform reinforcement learning during inference through a new 'in-context RL' prompting framework. The method shows LLMs can optimize scalar reward signals to improve response quality across multiple rounds, achieving significant improvements on complex tasks like mathematical competitions and creative writing.

Key Takeaways

→LLMs exhibit reinforcement learning capabilities during inference time without additional training through a multi-round prompting framework.
→The ICRL prompting method concatenates prior responses and rewards to guide self-improvement across inference rounds.
→Performance improvements were demonstrated on Game of 24, creative writing, ScienceWorld, and Olympiad-level math competitions.
→The approach outperformed existing baselines like Self-Refine and Reflexion across multiple benchmark tasks.
→Even when reward signals are generated by the same LLM, the method still shows performance improvements.