AIBullisharXiv – CS AI · 6h ago7/10
🧠
Nonsense Helps: Prompt Space Perturbation Broadens Reasoning Exploration
Researchers propose Lorem Perturbation for Exploration (LoPE), a training technique that addresses the zero-advantage problem in reinforcement learning for large language models by prepending random Latin-based text to prompts, enabling broader reasoning exploration across 1.7B to 7B parameter models.
🏢 Perplexity