#reinforcement-learning News & Analysis

511 articles tagged with #reinforcement-learning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

511 articles

AINeutralarXiv – CS AI · Mar 34/106

🧠

Chain-of-Context Learning: Dynamic Constraint Understanding for Multi-Task VRPs

Researchers propose Chain-of-Context Learning (CCL), a novel AI framework for solving multi-task Vehicle Routing Problems that dynamically adapts to evolving constraints during decision-making. The framework outperformed existing methods across 48 VRP variants, showing superior performance on both familiar and unseen constraint scenarios.

AIBullisharXiv – CS AI · Mar 34/107

🧠

Bridging Policy and Real-World Dynamics: LLM-Augmented Rebalancing for Shared Micromobility Systems

Researchers introduce AMPLIFY, an LLM-augmented framework for optimizing shared micromobility vehicle rebalancing in urban transportation systems. The system combines baseline rebalancing algorithms with real-time AI adaptation to handle emergent events like demand surges and regulatory changes, showing improved performance in Chicago e-scooter data testing.

AINeutralarXiv – CS AI · Mar 34/105

🧠

Hereditary Geometric Meta-RL: Nonlocal Generalization via Task Symmetries

Researchers developed a new Meta-Reinforcement Learning approach that uses geometric symmetries in task spaces to enable broader generalization beyond local smoothness assumptions. The method converts Meta-RL into symmetry discovery rather than smooth extrapolation, allowing agents to generalize across wider regions of task space with improved sample efficiency.

$NEAR

AINeutralarXiv – CS AI · Mar 34/104

🧠

Rethinking Policy Diversity in Ensemble Policy Gradient in Large-Scale Reinforcement Learning

Researchers propose Coupled Policy Optimization (CPO), a new reinforcement learning method that regulates policy diversity through KL constraints to improve exploration efficiency in large-scale parallel environments. The method outperforms existing baselines like PPO and SAPG across multiple tasks, demonstrating that controlled diverse exploration is key to stable and sample-efficient learning.

AINeutralarXiv – CS AI · Mar 34/106

🧠

Discrete World Models via Regularization

Researchers introduce Discrete World Models via Regularization (DWMR), a new method for learning Boolean representations of environments without requiring reconstruction or contrastive learning. The approach uses specialized regularizers to maximize entropy and independence while enforcing locality constraints, showing superior performance on benchmarks with combinatorial structure.

AINeutralarXiv – CS AI · Mar 34/104

🧠

Federated Agentic AI for Wireless Networks: Fundamentals, Approaches, and Applications

Researchers propose federated agentic AI approaches for wireless networks to address challenges of centralized AI architectures including high communication overhead and privacy risks. The paper introduces how federated learning can enhance autonomous AI systems in distributed wireless environments through collaborative learning without raw data exchange.

AINeutralarXiv – CS AI · Mar 24/106

🧠

Construct, Merge, Solve & Adapt with Reinforcement Learning for the min-max Multiple Traveling Salesman Problem

Researchers developed RL-CMSA, a hybrid reinforcement learning approach for solving the min-max Multiple Traveling Salesman Problem that combines probabilistic clustering, exact optimization, and solution refinement. The method outperforms existing algorithms by balancing exploration and exploitation to minimize the longest tour across multiple salesmen.

$NEAR

AINeutralarXiv – CS AI · Mar 24/106

🧠

Pessimistic Auxiliary Policy for Offline Reinforcement Learning

Researchers developed a new pessimistic auxiliary policy for offline reinforcement learning that reduces error accumulation by sampling more reliable actions. The approach maximizes the lower confidence bound of Q-functions to avoid high-value actions with potentially high errors during training.

AIBullisharXiv – CS AI · Mar 24/106

🧠

Bi-level RL-Heuristic Optimization for Real-world Winter Road Maintenance

Researchers developed a bi-level AI optimization framework using reinforcement learning to improve winter road maintenance operations on UK highway networks. The system strategically partitions road networks and optimizes vehicle routing while reducing travel times below two hours and minimizing carbon emissions.

AINeutralarXiv – CS AI · Mar 24/105

🧠

Bridging Dynamics Gaps via Diffusion Schr\"odinger Bridge for Cross-Domain Reinforcement Learning

Researchers propose BDGxRL, a novel framework using Diffusion Schrödinger Bridge to enable reinforcement learning agents to transfer policies across different domains without direct target environment access. The method aligns source domain transitions with target dynamics through offline demonstrations and introduces reward modulation for consistent learning.

AINeutralarXiv – CS AI · Mar 24/105

🧠

Beyond State-Wise Mirror Descent: Offline Policy Optimization with Parameteric Policies

Researchers present theoretical advances in offline reinforcement learning that extend beyond current limitations to work with parameterized policies over large or continuous action spaces. The work connects mirror descent to natural policy gradient methods and reveals a surprising unification between offline RL and imitation learning.

AINeutralarXiv – CS AI · Mar 24/106

🧠

Adaptive Correlation-Weighted Intrinsic Rewards for Reinforcement Learning

Researchers propose ACWI, a new reinforcement learning framework that dynamically balances intrinsic and extrinsic rewards through adaptive scaling coefficients. The system uses a lightweight Beta Network to optimize exploration in sparse reward environments, demonstrating improved sample efficiency and stability in MiniGrid experiments.

AINeutralarXiv – CS AI · Mar 24/106

🧠

Offline-to-Online Multi-Agent Reinforcement Learning with Offline Value Function Memory and Sequential Exploration

Researchers propose OVMSE, a new framework for Offline-to-Online Multi-Agent Reinforcement Learning that addresses key challenges in transitioning from offline training to online fine-tuning. The framework introduces Offline Value Function Memory and Sequential Exploration strategies to improve sample efficiency and performance in multi-agent environments.

AINeutralarXiv – CS AI · Mar 24/106

🧠

Less is more -- the Dispatcher/ Executor principle for multi-task Reinforcement Learning

Researchers propose a dispatcher/executor principle for multi-task Reinforcement Learning that partitions controllers into task-understanding and device-specific components connected by a regularized communication channel. This structural approach aims to improve generalization and data efficiency as an alternative to simply scaling large neural networks with vast datasets.

AINeutralarXiv – CS AI · Mar 24/107

🧠

LLM-hRIC: LLM-empowered Hierarchical RAN Intelligent Control for O-RAN

Researchers propose LLM-hRIC, a new framework that combines large language models with hierarchical radio access network intelligent controllers to improve O-RAN networks. The system uses LLM-powered non-real-time controllers for strategic guidance and reinforcement learning for near-real-time decision making in network management.

$NEAR

AINeutralarXiv – CS AI · Mar 24/106

🧠

Continuous Optimization for Feature Selection with Permutation-Invariant Embedding and Policy-Guided Search

Researchers propose a new framework for feature selection that uses permutation-invariant embedding and reinforcement learning to address limitations in current methods. The approach combines an encoder-decoder paradigm to preserve feature relationships without order bias and employs policy-based RL to explore embedding spaces without convexity assumptions.

AINeutralarXiv – CS AI · Mar 24/106

🧠

Bridging the Performance Gap Between Target-Free and Target-Based Reinforcement Learning

Researchers introduce iterated Shared Q-Learning (iS-QL), a new reinforcement learning method that bridges target-free and target-based approaches by using only the last linear layer as a target network while sharing other parameters. The technique achieves comparable performance to traditional target-based methods while maintaining the memory efficiency of target-free approaches.

AINeutralarXiv – CS AI · Mar 24/106

🧠

Heterogeneous Multi-Agent Reinforcement Learning with Attention for Cooperative and Scalable Feature Transformation

Researchers propose a new multi-agent reinforcement learning framework that uses three cooperative agents with attention mechanisms to automate feature transformation for machine learning models. The approach addresses key limitations in existing automated feature engineering methods, including dynamic feature expansion instability and insufficient agent cooperation.

AINeutralHugging Face Blog · Aug 53/108

🧠

Proximal Policy Optimization (PPO)

The article title references Proximal Policy Optimization (PPO), a reinforcement learning algorithm used in AI systems. However, no article body content was provided for analysis.

AINeutralHugging Face Blog · May 183/105

🧠

An Introduction to Q-Learning Part 1

This appears to be an educational article introducing Q-Learning, a reinforcement learning algorithm commonly used in AI and machine learning applications. However, the article body content was not provided for analysis.

AINeutralOpenAI News · Mar 203/105

🧠

Variance reduction for policy gradient with action-dependent factorized baselines

This appears to be a research paper on policy gradient methods in reinforcement learning, specifically focusing on variance reduction techniques using action-dependent factorized baselines. The article lacks content details, making it difficult to assess specific findings or implications.

AINeutralHugging Face Blog · Aug 141/105

🧠

Kimina-Prover-RL

The article title 'Kimina-Prover-RL' suggests a technical development related to reinforcement learning and proof systems. However, without article content, no specific details about the technology, its applications, or market implications can be determined.

AINeutralHugging Face Blog · Jun 121/107

🧠

Putting RL back in RLHF

The article appears to be incomplete or inaccessible, with only the title 'Putting RL back in RLHF' provided without any article body content. Without the actual content, it's not possible to provide meaningful analysis of this AI-related topic.

AINeutralHugging Face Blog · Oct 241/106

🧠

The N Implementation Details of RLHF with PPO

The article title references implementation details of Reinforcement Learning from Human Feedback (RLHF) using Proximal Policy Optimization (PPO), but the article body appears to be empty or incomplete.

AINeutralHugging Face Blog · Dec 91/106

🧠

Illustrating Reinforcement Learning from Human Feedback (RLHF)

The article appears to be about Reinforcement Learning from Human Feedback (RLHF), a machine learning technique used to train AI models based on human preferences and feedback. However, no article body content was provided for analysis.

← PrevPage 20 of 21Next →