y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#sparse-rewards News & Analysis

4 articles tagged with #sparse-rewards. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

4 articles
AINeutralarXiv – CS AI · 2d ago6/10
🧠

The Sample Complexity of Multiclass and Sparse Contextual Bandits

Researchers present optimal algorithms for sparse contextual bandits that achieve sample complexity of Õ((s/ε² + |A|/ε)log|Π|/δ), closing a gap from prior work that had exponential dependence on action set size. The results apply to multiclass classification and combinatorial semi-bandits through information-theoretic and algorithmic approaches.

AIBullisharXiv – CS AI · 2d ago6/10
🧠

HPO: Hysteretic Policy Optimization for Stable and Efficient Training under Sparse-Reward Regime

Researchers propose Hysteretic Policy Optimization (HPO), a refinement to GRPO reinforcement learning that addresses training instability in sparse-reward environments by downweighting negative-advantage updates and normalizing by mean length rather than per-response length. The adaptive variant (A-HPO) achieves 15% reward improvement over GRPO on benchmark tasks.

AIBullisharXiv – CS AI · Mar 26/1014
🧠

Actor-Critic for Continuous Action Chunks: A Reinforcement Learning Framework for Long-Horizon Robotic Manipulation with Sparse Reward

Researchers introduced AC3 (Actor-Critic for Continuous Chunks), a new reinforcement learning framework that addresses challenges in long-horizon robotic manipulation tasks with sparse rewards. The system uses continuous action chunks with stabilization mechanisms and achieved superior performance on 25 benchmark tasks using minimal demonstrations.

AINeutralarXiv – CS AI · Mar 24/106
🧠

Adaptive Correlation-Weighted Intrinsic Rewards for Reinforcement Learning

Researchers propose ACWI, a new reinforcement learning framework that dynamically balances intrinsic and extrinsic rewards through adaptive scaling coefficients. The system uses a lightweight Beta Network to optimize exploration in sparse reward environments, demonstrating improved sample efficiency and stability in MiniGrid experiments.