#exploration-strategy News & Analysis

2 articles tagged with #exploration-strategy. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles

AIBullisharXiv – CS AI · May 97/10

🧠

Nonsense Helps: Prompt Space Perturbation Broadens Reasoning Exploration

Researchers propose Lorem Perturbation for Exploration (LoPE), a training technique that addresses the zero-advantage problem in reinforcement learning for large language models by prepending random Latin-based text to prompts, enabling broader reasoning exploration across 1.7B to 7B parameter models.

🏢 Perplexity

AINeutralarXiv – CS AI · Jun 106/10

🧠

Reasoning or Memorization? Direction-Aware Diversity Exploration in LLM Reinforcement Learning

Researchers introduce DiRL, a reinforcement learning framework that distinguishes between genuine reasoning and memorization in large language models by anchoring exploration to an internal reasoning-memorization direction. The method integrates with Group Relative Policy Optimization to improve performance on mathematical and reasoning benchmarks while suppressing exploration of memorized shortcuts.