y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#reinforcement-learning News & Analysis

511 articles tagged with #reinforcement-learning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

511 articles
AIBullisharXiv โ€“ CS AI ยท Mar 165/10
๐Ÿง 

Accelerating Residual Reinforcement Learning with Uncertainty Estimation

Researchers developed an improved Residual Reinforcement Learning method that uses uncertainty estimation to enhance sample efficiency and work with stochastic base policies. The approach outperformed existing methods in simulation benchmarks and demonstrated successful zero-shot sim-to-real transfer in real-world deployments.

AINeutralarXiv โ€“ CS AI ยท Mar 115/10
๐Ÿง 

When Learning Rates Go Wrong: Early Structural Signals in PPO Actor-Critic

Researchers introduce the Overfitting-Underfitting Indicator (OUI) to analyze learning rate sensitivity in PPO reinforcement learning systems. The metric can identify problematic learning rates early in training by measuring neural activation patterns, enabling more efficient hyperparameter screening without full training runs.

AINeutralarXiv โ€“ CS AI ยท Mar 115/10
๐Ÿง 

Adversarial Latent-State Training for Robust Policies in Partially Observable Domains

Researchers developed a new framework for training robust AI policies in partially observable environments where adversaries can manipulate hidden initial conditions. The study demonstrates improved robustness through targeted exposure to shifted latent distributions, reducing performance gaps in benchmark tests.

AINeutralarXiv โ€“ CS AI ยท Mar 94/10
๐Ÿง 

Partial Policy Gradients for RL in LLMs

Researchers propose a new reinforcement learning approach for large language models that optimizes for subsets of future rewards rather than full sequences. The method enables comparison of different policy classes and shows varying effectiveness across different conversational AI alignment tasks.

AINeutralarXiv โ€“ CS AI ยท Mar 94/10
๐Ÿง 

A Reference Architecture of Reinforcement Learning Frameworks

Researchers propose a reference architecture for reinforcement learning frameworks after analyzing 18 state-of-the-practice implementations. The study identifies recurring architectural components and relationships to establish a common basis for comparison, evaluation, and integration across RL frameworks.

AINeutralarXiv โ€“ CS AI ยท Mar 54/10
๐Ÿง 

RVN-Bench: A Benchmark for Reactive Visual Navigation

Researchers introduced RVN-Bench, a new benchmark for testing indoor visual navigation systems for mobile robots that emphasizes collision avoidance in cluttered environments. Built on Habitat 2.0 simulator with high-fidelity HM3D scenes, it provides tools for training and evaluating AI agents that navigate using only visual observations without prior maps.

AINeutralarXiv โ€“ CS AI ยท Mar 54/10
๐Ÿง 

Selecting Offline Reinforcement Learning Algorithms for Stochastic Network Control

Research evaluates offline reinforcement learning algorithms for wireless network control, finding Conservative Q-Learning produces more robust policies under stochastic conditions than sequence-based methods. The study provides practical guidance for AI-driven network management in O-RAN and 6G systems where online exploration is unsafe.

AINeutralarXiv โ€“ CS AI ยท Mar 54/10
๐Ÿง 

BeamPERL: Parameter-Efficient RL with Verifiable Rewards Specializes Compact LLMs for Structured Beam Mechanics Reasoning

Researchers trained a compact 1.5B parameter language model to solve beam physics problems using reinforcement learning with verifiable rewards, achieving 66.7% improvement in accuracy. However, the model learned pattern-matching templates rather than true physics reasoning, failing to generalize to topological changes despite mastering the same underlying equations.

AINeutralarXiv โ€“ CS AI ยท Mar 54/10
๐Ÿง 

Multi-Agent-Based Simulation of Archaeological Mobility in Uneven Landscapes

Researchers developed a multi-agent simulation framework using reinforcement learning to model archaeological mobility patterns in complex terrain. The system combines global path planning with local adaptation to simulate human and animal movement in historical landscapes, demonstrated through pursuit scenarios and transport analysis.

AINeutralarXiv โ€“ CS AI ยท Mar 54/10
๐Ÿง 

UrbanHuRo: A Two-Layer Human-Robot Collaboration Framework for the Joint Optimization of Heterogeneous Urban Services

Researchers propose UrbanHuRo, a two-layer human-robot collaboration framework that jointly optimizes different urban services like delivery and sensing. The system demonstrated 29.7% improvement in sensing coverage and 39.2% increase in courier income while reducing overdue orders through coordinated optimization of heterogeneous services.

AINeutralarXiv โ€“ CS AI ยท Mar 54/10
๐Ÿง 

Fairness Begins with State: Purifying Latent Preferences for Hierarchical Reinforcement Learning in Interactive Recommendation

Researchers propose DSRM-HRL, a new framework that uses diffusion models to purify user preference data and hierarchical reinforcement learning to balance recommendation accuracy with fairness. The system addresses bias in interactive recommendation systems by separating state estimation from decision-making, achieving better outcomes on both utility and exposure equity.

AINeutralarXiv โ€“ CS AI ยท Mar 54/10
๐Ÿง 

Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation

Researchers propose a standardized framework for classifying and evaluating memory capabilities in reinforcement learning agents, drawing from cognitive science concepts. The paper addresses confusion around memory terminology in RL and provides practical definitions for different memory types along with robust experimental methodologies.

AINeutralarXiv โ€“ CS AI ยท Mar 54/10
๐Ÿง 

AutoQD: Automatic Discovery of Diverse Behaviors with Quality-Diversity Optimization

Researchers present AutoQD, a new AI method that automatically discovers diverse behavioral policies without requiring hand-crafted descriptors. The approach uses mathematical embeddings of policy occupancy measures to enable Quality-Diversity optimization algorithms to find varied high-performing solutions in reinforcement learning tasks.

AINeutralarXiv โ€“ CS AI ยท Mar 54/10
๐Ÿง 

Q-Guided Stein Variational Model Predictive Control via RL-informed Policy Prior

Researchers have developed Q-SVMPC, a new Model Predictive Control method that combines reinforcement learning with Stein variational inference to improve trajectory optimization. The approach addresses limitations in existing MPC methods that often converge to single solutions, instead maintaining diverse solution paths for better performance in robotics applications.

AINeutralarXiv โ€“ CS AI ยท Mar 44/102
๐Ÿง 

Diffusion-MPC in Discrete Domains: Feasibility Constraints, Horizon Effects, and Critic Alignment: Case study with Tetris

Researchers studied diffusion-based model predictive control in discrete domains using Tetris, finding that feasibility constraints are necessary and shorter planning horizons outperform longer ones. The study reveals structural challenges with discrete diffusion planners, particularly misalignment issues with DQN critics that produce high decision regret.

AINeutralarXiv โ€“ CS AI ยท Mar 44/103
๐Ÿง 

Improving Diffusion Planners by Self-Supervised Action Gating with Energies

Researchers propose SAGE (Self-supervised Action Gating with Energies), a new method to improve diffusion planners in offline reinforcement learning by filtering out dynamically inconsistent trajectories. The approach uses a latent consistency signal to re-rank candidate actions at inference time, improving performance across locomotion, navigation, and manipulation tasks without requiring environment rollouts or policy retraining.

AINeutralarXiv โ€“ CS AI ยท Mar 44/103
๐Ÿง 

Learning to Generate and Extract: A Multi-Agent Collaboration Framework For Zero-shot Document-level Event Arguments Extraction

Researchers introduce a multi-agent collaboration framework for zero-shot document-level event argument extraction that uses AI agents to generate, evaluate, and refine synthetic training data. The system employs reinforcement learning to iteratively improve both data generation quality and argument extraction performance through a collaborative process.

AIBullisharXiv โ€“ CS AI ยท Mar 44/102
๐Ÿง 

Reinforcement Learning with Symbolic Reward Machines

Researchers propose Symbolic Reward Machines (SRMs) as an improvement over traditional Reward Machines in reinforcement learning, eliminating the need for manual user input while maintaining performance. SRMs process observations directly through symbolic formulas, making them more applicable to widely adopted RL frameworks.

AINeutralarXiv โ€“ CS AI ยท Mar 44/103
๐Ÿง 

Proactive Guiding Strategy for Item-side Fairness in Interactive Recommendation

Researchers propose HRL4PFG, a new interactive recommendation framework using hierarchical reinforcement learning to promote fairness by guiding user preferences toward long-tail items. The approach aims to balance item-side fairness with user satisfaction, showing improved performance in cumulative interaction rewards and user engagement length compared to existing methods.

AINeutralarXiv โ€“ CS AI ยท Mar 44/102
๐Ÿง 

Enhancing Generative Auto-bidding with Offline Reward Evaluation and Policy Search

Researchers developed AIGB-Pearl, a new AI-driven auto-bidding system that combines generative planning with policy optimization to improve advertising performance. The system addresses limitations of existing offline reinforcement learning methods by incorporating a trajectory evaluator and safe exploration mechanisms beyond static datasets.

AIBullisharXiv โ€“ CS AI ยท Mar 35/105
๐Ÿง 

Harmonizing Dense and Sparse Signals in Multi-turn RL: Dual-Horizon Credit Assignment for Industrial Sales Agents

Researchers propose Dual-Horizon Credit Assignment (DuCA), a new framework for optimizing large language models in industrial sales applications. The method addresses training instability by separately normalizing short-term linguistic rewards and long-term commercial rewards, achieving 6.82% improvement in conversion rates while reducing repetition and detection issues.

AIBullisharXiv โ€“ CS AI ยท Mar 35/106
๐Ÿง 

Learning to Explore: Policy-Guided Outlier Synthesis for Graph Out-of-Distribution Detection

Researchers propose PGOS (Policy-Guided Outlier Synthesis), a new framework that uses reinforcement learning to improve Graph Neural Network safety by better detecting out-of-distribution graphs. The system replaces static sampling methods with a learned exploration strategy that navigates low-density regions to generate pseudo-OOD graphs for enhanced detector training.

AIBullisharXiv โ€“ CS AI ยท Mar 35/105
๐Ÿง 

Integrating LTL Constraints into PPO for Safe Reinforcement Learning

Researchers developed PPO-LTL, a new framework that integrates Linear Temporal Logic safety constraints into Proximal Policy Optimization for safer reinforcement learning. The system uses Bรผchi automata to monitor safety violations and converts them into penalty signals, showing reduced safety violations while maintaining competitive performance in robotics environments.

AINeutralarXiv โ€“ CS AI ยท Mar 35/107
๐Ÿง 

SubstratumGraphEnv: Reinforcement Learning Environment (RLE) for Modeling System Attack Paths

Researchers developed SubstratumGraphEnv, a reinforcement learning framework that models Windows system attack paths using graph representations derived from Sysmon logs. The system combines Graph Convolutional Networks with Actor-Critic models to automate cybersecurity threat analysis and identify malicious process sequences.