AINeutralarXiv – CS AI · 7h ago6/10
🧠
LC-ERD: Mining Latent Logic for Self-Evolving Reasoning via Consistency-Regulated Reward Decomposition
Researchers introduce LC-ERD, a framework for improving Large Language Model reasoning by mining high-quality supervision signals through consistency-regulated reward decomposition. The method addresses critical challenges in self-aligned LLM training by reducing label noise, providing granular step-level guidance, and preventing distributional collapse, demonstrating potential improvements in reasoning quality and generalization.