y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#importance-sampling News & Analysis

3 articles tagged with #importance-sampling. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

3 articles
AIBullisharXiv – CS AI · Mar 56/10
🧠

GIPO: Gaussian Importance Sampling Policy Optimization

GIPO (Gaussian Importance Sampling Policy Optimization) is a new reinforcement learning method that improves data efficiency for training multimodal AI agents. The approach uses Gaussian trust weights instead of hard clipping to better handle scarce or outdated training data, showing superior performance and stability across various experimental conditions.

AINeutralarXiv – CS AI · May 296/10
🧠

Quotient DAGs for Off-Policy Evaluation:Forward-Flow Importance Sampling and Exact Slate Propensities

Researchers introduce Quotient DAGs, a novel framework for off-policy evaluation that addresses variance issues in importance sampling by recognizing when generation process details are irrelevant to evaluation targets. The method computes exact unordered slate propensities efficiently through Forward-DP, a dynamic programming approach that avoids factorial enumeration, enabling practical evaluation for autoregressive slate recommendation systems.

AIBullisharXiv – CS AI · May 116/10
🧠

Rethinking Importance Sampling in LLM Policy Optimization: A Cumulative Token Perspective

Researchers propose CTPO (Cumulative Token Policy Optimization), a new approach to reinforcement learning for large language models that addresses the bias-variance tradeoff in importance sampling ratios. By using cumulative token-level ratios with position-adaptive clipping, CTPO achieves superior performance on mathematical reasoning benchmarks compared to existing methods like PPO and GRPO.