10 articles tagged with #offline-learning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv โ CS AI ยท Feb 277/109
๐ง Researchers achieved breakthrough sample complexity improvements for offline reinforcement learning algorithms using f-divergence regularization, particularly for contextual bandits. The study demonstrates optimal O(ฮตโปยน) sample complexity under single-policy concentrability conditions, significantly improving upon existing bounds.
$NEAR
AINeutralarXiv โ CS AI ยท 3d ago6/10
๐ง Researchers introduce WOMBET, a framework that improves reinforcement learning efficiency in robotics by generating synthetic training data from a world model in source tasks and selectively transferring it to target tasks. The approach combines offline-to-online learning with uncertainty-aware planning to reduce data collection costs while maintaining robustness.
AIBullisharXiv โ CS AI ยท Apr 66/10
๐ง Researchers have developed OPRIDE, a new algorithm for offline preference-based reinforcement learning that significantly improves query efficiency. The algorithm addresses key challenges of inefficient exploration and overoptimization through principled exploration strategies and discount scheduling mechanisms.
AIBullisharXiv โ CS AI ยท Mar 26/1015
๐ง Researchers propose OM2P, a new offline multi-agent reinforcement learning algorithm that achieves efficient one-step action sampling using mean-flow models. The approach delivers up to 3.8x reduction in GPU memory usage and 10.8x speed-up in training time compared to existing diffusion and flow-based models.
AINeutralarXiv โ CS AI ยท Mar 27/1022
๐ง Researchers developed an offline-to-online reinforcement learning framework that improves robot control robustness through adversarial fine-tuning. The method trains policies on clean datasets then applies action perturbations during fine-tuning to build resilience against actuator faults and environmental uncertainties.
AIBullisharXiv โ CS AI ยท Feb 276/106
๐ง Researchers have developed LLM4Cov, an offline learning framework that enables AI agents to generate high-coverage hardware verification testbenches without expensive online reinforcement learning. A compact 4B-parameter model achieved 69.2% coverage pass rate, outperforming larger models by demonstrating efficient learning from execution feedback in hardware verification tasks.
AINeutralarXiv โ CS AI ยท Mar 174/10
๐ง Researchers introduce Safe Flow Q-Learning (SafeFQL), a new offline safe reinforcement learning method that combines Hamilton-Jacobi reachability with flow policies for safety-critical real-time control. The method achieves better safety performance with lower inference latency compared to existing diffusion-based approaches, making it more suitable for real-time deployment.
AINeutralOpenAI News ยท Nov 54/107
๐ง The article discusses a model-based control approach for efficient learning and exploration that combines online planning with offline learning. This methodology aims to optimize the balance between computational efficiency and learning effectiveness in AI systems.
AINeutralarXiv โ CS AI ยท Mar 34/106
๐ง Researchers developed COffeE-PSRO, a new algorithm that applies offline reinforcement learning to game-theoretic multiagent systems. The approach extends Policy Space Response Oracles by incorporating uncertainty quantification and conservative exploration to find equilibrium strategies from fixed datasets without online interaction.
AINeutralarXiv โ CS AI ยท Mar 24/106
๐ง Researchers propose OVMSE, a new framework for Offline-to-Online Multi-Agent Reinforcement Learning that addresses key challenges in transitioning from offline training to online fine-tuning. The framework introduces Offline Value Function Memory and Sequential Exploration strategies to improve sample efficiency and performance in multi-agent environments.