y0news
#sample-efficiency2 articles
2 articles
AINeutralarXiv โ€“ CS AI ยท 4h ago0
๐Ÿง 

Adaptive Correlation-Weighted Intrinsic Rewards for Reinforcement Learning

Researchers propose ACWI, a new reinforcement learning framework that dynamically balances intrinsic and extrinsic rewards through adaptive scaling coefficients. The system uses a lightweight Beta Network to optimize exploration in sparse reward environments, demonstrating improved sample efficiency and stability in MiniGrid experiments.

AINeutralarXiv โ€“ CS AI ยท 4h ago0
๐Ÿง 

Offline-to-Online Multi-Agent Reinforcement Learning with Offline Value Function Memory and Sequential Exploration

Researchers propose OVMSE, a new framework for Offline-to-Online Multi-Agent Reinforcement Learning that addresses key challenges in transitioning from offline training to online fine-tuning. The framework introduces Offline Value Function Memory and Sequential Exploration strategies to improve sample efficiency and performance in multi-agent environments.