y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#offline-rl News & Analysis

16 articles tagged with #offline-rl. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

16 articles
AIBullisharXiv – CS AI · 4d ago7/10
🧠

Yes, Q-learning Helps Offline In-Context RL

Researchers demonstrate that integrating reinforcement learning objectives into offline in-context RL frameworks significantly outperforms supervised learning approaches like Algorithm Distillation, achieving ~30% performance improvements across diverse environments and doubling performance in complex settings. The findings validate that aligning ICRL training with RL reward-maximization goals, particularly through conservative value learning, produces more effective agents.

AIBullisharXiv – CS AI · Feb 277/106
🧠

Decision MetaMamba: Enhancing Selective SSM in Offline RL with Heterogeneous Sequence Mixing

Researchers propose Decision MetaMamba (DMM), a new AI model architecture that improves offline reinforcement learning by addressing information loss issues in Mamba-based models. The solution uses a dense layer-based sequence mixer and modified positional structure to achieve state-of-the-art performance with fewer parameters.

AIBullisharXiv – CS AI · 2d ago6/10
🧠

BORA: Bridging Offline Reinforcement Learning and Online Residual Adaptation for Real-World Dexterous VLA Models

Researchers introduce BORA, an offline-to-online reinforcement learning framework that enables Vision-Language-Action (VLA) models to perform complex dexterous robotic manipulation tasks more reliably in real-world settings. The method combines offline critic training with lightweight online adaptation, achieving 33% improvement in success rates over traditional imitation learning approaches.

AINeutralarXiv – CS AI · May 126/10
🧠

Beyond Penalization: Diffusion-based Out-of-Distribution Detection and Selective Regularization in Offline Reinforcement Learning

DOSER introduces a diffusion-model-based framework for offline reinforcement learning that improves out-of-distribution (OOD) action detection beyond traditional penalization methods. The approach uses single-step denoising reconstruction error to identify risky actions while selectively encouraging beneficial exploration, with theoretical guarantees of convergence and empirical superiority on suboptimal datasets.

AINeutralarXiv – CS AI · May 126/10
🧠

Path-Coupled Bellman Flows for Distributional Reinforcement Learning

Researchers propose Path-Coupled Bellman Flows (PCBF), a novel distributional reinforcement learning method that addresses limitations in existing flow-based approaches by using source-consistent paths and shared noise coupling to improve training stability and return distribution fidelity. The approach demonstrates competitive performance on benchmark tasks while maintaining computational efficiency through variance-reduction techniques.

AINeutralarXiv – CS AI · May 116/10
🧠

Learning Visual Feature-Based World Models via Residual Latent Action

Researchers introduce Residual Latent Action (RLA), a new latent action representation learned from DINO visual features, enabling more efficient and accurate world models that predict future visual features rather than raw pixels. RLA-WM outperforms existing feature-based and video-diffusion approaches while being orders of magnitude faster, with applications in robot learning from offline video demonstrations.

AINeutralarXiv – CS AI · May 16/10
🧠

EXPO: Stable Reinforcement Learning with Expressive Policies

Researchers introduce EXPO, a reinforcement learning algorithm that trains expressive policies (like diffusion models) more efficiently by avoiding direct value optimization. The method uses a lightweight Gaussian policy to edit actions from a base policy, achieving 2-3x improvements in sample efficiency for both offline-to-online and fine-tuning scenarios.

AINeutralarXiv – CS AI · Mar 55/10
🧠

IPD: Boosting Sequential Policy with Imaginary Planning Distillation in Offline Reinforcement Learning

Researchers propose Imaginary Planning Distillation (IPD), a novel framework that enhances offline reinforcement learning by incorporating planning into sequential policy models. IPD uses world models and Model Predictive Control to generate optimal rollouts, training Transformer-based policies that significantly outperform existing methods on D4RL benchmarks.

AIBullisharXiv – CS AI · Mar 27/1016
🧠

SMAC: Score-Matched Actor-Critics for Robust Offline-to-Online Transfer

Researchers developed Score Matched Actor-Critic (SMAC), a new offline reinforcement learning method that enables smooth transition to online RL algorithms without performance drops. SMAC achieved successful transfer in all 6 D4RL tasks tested and reduced regret by 34-58% in 4 of 6 environments compared to best baselines.

AINeutralarXiv – CS AI · Mar 174/10
🧠

Chunk-Guided Q-Learning

Researchers introduce Chunk-Guided Q-Learning (CGQ), a new offline reinforcement learning algorithm that combines single-step and multi-step temporal difference learning approaches. The method achieves better performance on long-horizon tasks by reducing error accumulation while maintaining fine-grained value propagation, with theoretical guarantees and empirical validation on OGBench tasks.

AINeutralarXiv – CS AI · Mar 54/10
🧠

Selecting Offline Reinforcement Learning Algorithms for Stochastic Network Control

Research evaluates offline reinforcement learning algorithms for wireless network control, finding Conservative Q-Learning produces more robust policies under stochastic conditions than sequence-based methods. The study provides practical guidance for AI-driven network management in O-RAN and 6G systems where online exploration is unsafe.

AINeutralarXiv – CS AI · Mar 44/103
🧠

Improving Diffusion Planners by Self-Supervised Action Gating with Energies

Researchers propose SAGE (Self-supervised Action Gating with Energies), a new method to improve diffusion planners in offline reinforcement learning by filtering out dynamically inconsistent trajectories. The approach uses a latent consistency signal to re-rank candidate actions at inference time, improving performance across locomotion, navigation, and manipulation tasks without requiring environment rollouts or policy retraining.

AINeutralarXiv – CS AI · Mar 34/102
🧠

Return Augmented Decision Transformer for Off-Dynamics Reinforcement Learning

Researchers introduce Return Augmented (REAG) method for Decision Transformer frameworks to improve offline reinforcement learning when training data comes from different dynamics than the target domain. The method aligns return distributions between source and target domains, with theoretical analysis showing it achieves optimal performance levels despite dynamics shifts.

AINeutralarXiv – CS AI · Mar 24/106
🧠

Pessimistic Auxiliary Policy for Offline Reinforcement Learning

Researchers developed a new pessimistic auxiliary policy for offline reinforcement learning that reduces error accumulation by sampling more reliable actions. The approach maximizes the lower confidence bound of Q-functions to avoid high-value actions with potentially high errors during training.

AINeutralarXiv – CS AI · Mar 24/105
🧠

Beyond State-Wise Mirror Descent: Offline Policy Optimization with Parameteric Policies

Researchers present theoretical advances in offline reinforcement learning that extend beyond current limitations to work with parameterized policies over large or continuous action spaces. The work connects mirror descent to natural policy gradient methods and reveals a surprising unification between offline RL and imitation learning.