#rl-training News & Analysis

2 articles tagged with #rl-training. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles

AIBullisharXiv – CS AI · May 47/10

🧠

AEM: Adaptive Entropy Modulation for Multi-Turn Agentic Reinforcement Learning

Researchers present AEM (Adaptive Entropy Modulation), a new credit assignment method for reinforcement learning that improves how language model agents learn from sparse rewards without requiring dense supervision. The technique adaptively modulates entropy during training to balance exploration and exploitation, achieving a 1.4% improvement on the challenging SWE-bench-Verified benchmark across models ranging from 1.5B to 32B parameters.

AIBullisharXiv – CS AI · Apr 137/10

🧠

The Two-Stage Decision-Sampling Hypothesis: Understanding the Emergence of Self-Reflection in RL-Trained LLMs

Researchers introduce the Two-Stage Decision-Sampling Hypothesis to explain how reinforcement learning enables self-reflection capabilities in large language models, demonstrating that RL's superior performance stems from improved decision-making rather than generation quality. The theory shows that reward gradients distribute asymmetrically across policy components, explaining why RL succeeds where supervised fine-tuning fails.