511 articles tagged with #reinforcement-learning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AINeutralarXiv β CS AI Β· Mar 34/106
π§ Researchers propose Chain-of-Context Learning (CCL), a novel AI framework for solving multi-task Vehicle Routing Problems that dynamically adapts to evolving constraints during decision-making. The framework outperformed existing methods across 48 VRP variants, showing superior performance on both familiar and unseen constraint scenarios.
AIBullisharXiv β CS AI Β· Mar 34/107
π§ Researchers introduce AMPLIFY, an LLM-augmented framework for optimizing shared micromobility vehicle rebalancing in urban transportation systems. The system combines baseline rebalancing algorithms with real-time AI adaptation to handle emergent events like demand surges and regulatory changes, showing improved performance in Chicago e-scooter data testing.
AINeutralarXiv β CS AI Β· Mar 34/105
π§ Researchers developed a new Meta-Reinforcement Learning approach that uses geometric symmetries in task spaces to enable broader generalization beyond local smoothness assumptions. The method converts Meta-RL into symmetry discovery rather than smooth extrapolation, allowing agents to generalize across wider regions of task space with improved sample efficiency.
$NEAR
AINeutralarXiv β CS AI Β· Mar 34/104
π§ Researchers propose Coupled Policy Optimization (CPO), a new reinforcement learning method that regulates policy diversity through KL constraints to improve exploration efficiency in large-scale parallel environments. The method outperforms existing baselines like PPO and SAPG across multiple tasks, demonstrating that controlled diverse exploration is key to stable and sample-efficient learning.
AINeutralarXiv β CS AI Β· Mar 34/106
π§ Researchers introduce Discrete World Models via Regularization (DWMR), a new method for learning Boolean representations of environments without requiring reconstruction or contrastive learning. The approach uses specialized regularizers to maximize entropy and independence while enforcing locality constraints, showing superior performance on benchmarks with combinatorial structure.
AINeutralarXiv β CS AI Β· Mar 34/104
π§ Researchers propose federated agentic AI approaches for wireless networks to address challenges of centralized AI architectures including high communication overhead and privacy risks. The paper introduces how federated learning can enhance autonomous AI systems in distributed wireless environments through collaborative learning without raw data exchange.
AINeutralarXiv β CS AI Β· Mar 24/106
π§ Researchers developed RL-CMSA, a hybrid reinforcement learning approach for solving the min-max Multiple Traveling Salesman Problem that combines probabilistic clustering, exact optimization, and solution refinement. The method outperforms existing algorithms by balancing exploration and exploitation to minimize the longest tour across multiple salesmen.
$NEAR
AINeutralarXiv β CS AI Β· Mar 24/106
π§ Researchers developed a new pessimistic auxiliary policy for offline reinforcement learning that reduces error accumulation by sampling more reliable actions. The approach maximizes the lower confidence bound of Q-functions to avoid high-value actions with potentially high errors during training.
AIBullisharXiv β CS AI Β· Mar 24/106
π§ Researchers developed a bi-level AI optimization framework using reinforcement learning to improve winter road maintenance operations on UK highway networks. The system strategically partitions road networks and optimizes vehicle routing while reducing travel times below two hours and minimizing carbon emissions.
AINeutralarXiv β CS AI Β· Mar 24/105
π§ Researchers propose BDGxRL, a novel framework using Diffusion SchrΓΆdinger Bridge to enable reinforcement learning agents to transfer policies across different domains without direct target environment access. The method aligns source domain transitions with target dynamics through offline demonstrations and introduces reward modulation for consistent learning.
AINeutralarXiv β CS AI Β· Mar 24/105
π§ Researchers present theoretical advances in offline reinforcement learning that extend beyond current limitations to work with parameterized policies over large or continuous action spaces. The work connects mirror descent to natural policy gradient methods and reveals a surprising unification between offline RL and imitation learning.
AINeutralarXiv β CS AI Β· Mar 24/106
π§ Researchers propose ACWI, a new reinforcement learning framework that dynamically balances intrinsic and extrinsic rewards through adaptive scaling coefficients. The system uses a lightweight Beta Network to optimize exploration in sparse reward environments, demonstrating improved sample efficiency and stability in MiniGrid experiments.
AINeutralarXiv β CS AI Β· Mar 24/106
π§ Researchers propose OVMSE, a new framework for Offline-to-Online Multi-Agent Reinforcement Learning that addresses key challenges in transitioning from offline training to online fine-tuning. The framework introduces Offline Value Function Memory and Sequential Exploration strategies to improve sample efficiency and performance in multi-agent environments.
AINeutralarXiv β CS AI Β· Mar 24/106
π§ Researchers propose a dispatcher/executor principle for multi-task Reinforcement Learning that partitions controllers into task-understanding and device-specific components connected by a regularized communication channel. This structural approach aims to improve generalization and data efficiency as an alternative to simply scaling large neural networks with vast datasets.
AINeutralarXiv β CS AI Β· Mar 24/107
π§ Researchers propose LLM-hRIC, a new framework that combines large language models with hierarchical radio access network intelligent controllers to improve O-RAN networks. The system uses LLM-powered non-real-time controllers for strategic guidance and reinforcement learning for near-real-time decision making in network management.
$NEAR
AINeutralarXiv β CS AI Β· Mar 24/106
π§ Researchers propose a new framework for feature selection that uses permutation-invariant embedding and reinforcement learning to address limitations in current methods. The approach combines an encoder-decoder paradigm to preserve feature relationships without order bias and employs policy-based RL to explore embedding spaces without convexity assumptions.
AINeutralarXiv β CS AI Β· Mar 24/106
π§ Researchers introduce iterated Shared Q-Learning (iS-QL), a new reinforcement learning method that bridges target-free and target-based approaches by using only the last linear layer as a target network while sharing other parameters. The technique achieves comparable performance to traditional target-based methods while maintaining the memory efficiency of target-free approaches.
AINeutralarXiv β CS AI Β· Mar 24/106
π§ Researchers propose a new multi-agent reinforcement learning framework that uses three cooperative agents with attention mechanisms to automate feature transformation for machine learning models. The approach addresses key limitations in existing automated feature engineering methods, including dynamic feature expansion instability and insufficient agent cooperation.
AINeutralHugging Face Blog Β· Aug 53/108
π§ The article title references Proximal Policy Optimization (PPO), a reinforcement learning algorithm used in AI systems. However, no article body content was provided for analysis.
AINeutralHugging Face Blog Β· May 183/105
π§ This appears to be an educational article introducing Q-Learning, a reinforcement learning algorithm commonly used in AI and machine learning applications. However, the article body content was not provided for analysis.
AINeutralOpenAI News Β· Mar 203/105
π§ This appears to be a research paper on policy gradient methods in reinforcement learning, specifically focusing on variance reduction techniques using action-dependent factorized baselines. The article lacks content details, making it difficult to assess specific findings or implications.
AINeutralHugging Face Blog Β· Aug 141/105
π§ The article title 'Kimina-Prover-RL' suggests a technical development related to reinforcement learning and proof systems. However, without article content, no specific details about the technology, its applications, or market implications can be determined.
AINeutralHugging Face Blog Β· Jun 121/107
π§ The article appears to be incomplete or inaccessible, with only the title 'Putting RL back in RLHF' provided without any article body content. Without the actual content, it's not possible to provide meaningful analysis of this AI-related topic.
AINeutralHugging Face Blog Β· Oct 241/106
π§ The article title references implementation details of Reinforcement Learning from Human Feedback (RLHF) using Proximal Policy Optimization (PPO), but the article body appears to be empty or incomplete.
AINeutralHugging Face Blog Β· Dec 91/106
π§ The article appears to be about Reinforcement Learning from Human Feedback (RLHF), a machine learning technique used to train AI models based on human preferences and feedback. However, no article body content was provided for analysis.