y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#optimization-methods News & Analysis

1 article tagged with #optimization-methods. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

1 articles
AINeutralarXiv – CS AI · 6h ago6/10
🧠

Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex

Researchers propose Listwise Policy Optimization (LPO), a new framework for training large language models that improves upon existing reinforcement learning approaches by explicitly projecting policies toward target distributions on the response simplex. The method demonstrates consistent performance improvements across reasoning tasks while maintaining training stability and response diversity.