y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#ctpo News & Analysis

1 article tagged with #ctpo. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

1 articles
AIBullisharXiv – CS AI · 9h ago6/10
🧠

Rethinking Importance Sampling in LLM Policy Optimization: A Cumulative Token Perspective

Researchers propose CTPO (Cumulative Token Policy Optimization), a new approach to reinforcement learning for large language models that addresses the bias-variance tradeoff in importance sampling ratios. By using cumulative token-level ratios with position-adaptive clipping, CTPO achieves superior performance on mathematical reasoning benchmarks compared to existing methods like PPO and GRPO.