9 articles tagged with #offline-rl. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv โ CS AI ยท Feb 277/106
๐ง Researchers propose Decision MetaMamba (DMM), a new AI model architecture that improves offline reinforcement learning by addressing information loss issues in Mamba-based models. The solution uses a dense layer-based sequence mixer and modified positional structure to achieve state-of-the-art performance with fewer parameters.
AINeutralarXiv โ CS AI ยท Mar 55/10
๐ง Researchers propose Imaginary Planning Distillation (IPD), a novel framework that enhances offline reinforcement learning by incorporating planning into sequential policy models. IPD uses world models and Model Predictive Control to generate optimal rollouts, training Transformer-based policies that significantly outperform existing methods on D4RL benchmarks.
AIBullisharXiv โ CS AI ยท Mar 27/1016
๐ง Researchers developed Score Matched Actor-Critic (SMAC), a new offline reinforcement learning method that enables smooth transition to online RL algorithms without performance drops. SMAC achieved successful transfer in all 6 D4RL tasks tested and reduced regret by 34-58% in 4 of 6 environments compared to best baselines.
AINeutralarXiv โ CS AI ยท Mar 174/10
๐ง Researchers introduce Chunk-Guided Q-Learning (CGQ), a new offline reinforcement learning algorithm that combines single-step and multi-step temporal difference learning approaches. The method achieves better performance on long-horizon tasks by reducing error accumulation while maintaining fine-grained value propagation, with theoretical guarantees and empirical validation on OGBench tasks.
AINeutralarXiv โ CS AI ยท Mar 54/10
๐ง Research evaluates offline reinforcement learning algorithms for wireless network control, finding Conservative Q-Learning produces more robust policies under stochastic conditions than sequence-based methods. The study provides practical guidance for AI-driven network management in O-RAN and 6G systems where online exploration is unsafe.
AINeutralarXiv โ CS AI ยท Mar 44/103
๐ง Researchers propose SAGE (Self-supervised Action Gating with Energies), a new method to improve diffusion planners in offline reinforcement learning by filtering out dynamically inconsistent trajectories. The approach uses a latent consistency signal to re-rank candidate actions at inference time, improving performance across locomotion, navigation, and manipulation tasks without requiring environment rollouts or policy retraining.
AINeutralarXiv โ CS AI ยท Mar 34/102
๐ง Researchers introduce Return Augmented (REAG) method for Decision Transformer frameworks to improve offline reinforcement learning when training data comes from different dynamics than the target domain. The method aligns return distributions between source and target domains, with theoretical analysis showing it achieves optimal performance levels despite dynamics shifts.
AINeutralarXiv โ CS AI ยท Mar 24/106
๐ง Researchers developed a new pessimistic auxiliary policy for offline reinforcement learning that reduces error accumulation by sampling more reliable actions. The approach maximizes the lower confidence bound of Q-functions to avoid high-value actions with potentially high errors during training.
AINeutralarXiv โ CS AI ยท Mar 24/105
๐ง Researchers present theoretical advances in offline reinforcement learning that extend beyond current limitations to work with parameterized policies over large or continuous action spaces. The work connects mirror descent to natural policy gradient methods and reveals a surprising unification between offline RL and imitation learning.