511 articles tagged with #reinforcement-learning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv โ CS AI ยท Mar 165/10
๐ง Researchers developed an improved Residual Reinforcement Learning method that uses uncertainty estimation to enhance sample efficiency and work with stochastic base policies. The approach outperformed existing methods in simulation benchmarks and demonstrated successful zero-shot sim-to-real transfer in real-world deployments.
AINeutralarXiv โ CS AI ยท Mar 115/10
๐ง Researchers introduce the Overfitting-Underfitting Indicator (OUI) to analyze learning rate sensitivity in PPO reinforcement learning systems. The metric can identify problematic learning rates early in training by measuring neural activation patterns, enabling more efficient hyperparameter screening without full training runs.
AINeutralarXiv โ CS AI ยท Mar 115/10
๐ง Researchers developed a new framework for training robust AI policies in partially observable environments where adversaries can manipulate hidden initial conditions. The study demonstrates improved robustness through targeted exposure to shifted latent distributions, reducing performance gaps in benchmark tests.
AINeutralarXiv โ CS AI ยท Mar 94/10
๐ง Researchers propose a new reinforcement learning approach for large language models that optimizes for subsets of future rewards rather than full sequences. The method enables comparison of different policy classes and shows varying effectiveness across different conversational AI alignment tasks.
AINeutralarXiv โ CS AI ยท Mar 94/10
๐ง Researchers propose a reference architecture for reinforcement learning frameworks after analyzing 18 state-of-the-practice implementations. The study identifies recurring architectural components and relationships to establish a common basis for comparison, evaluation, and integration across RL frameworks.
AINeutralarXiv โ CS AI ยท Mar 54/10
๐ง Researchers introduced RVN-Bench, a new benchmark for testing indoor visual navigation systems for mobile robots that emphasizes collision avoidance in cluttered environments. Built on Habitat 2.0 simulator with high-fidelity HM3D scenes, it provides tools for training and evaluating AI agents that navigate using only visual observations without prior maps.
AINeutralarXiv โ CS AI ยท Mar 54/10
๐ง Research evaluates offline reinforcement learning algorithms for wireless network control, finding Conservative Q-Learning produces more robust policies under stochastic conditions than sequence-based methods. The study provides practical guidance for AI-driven network management in O-RAN and 6G systems where online exploration is unsafe.
AINeutralarXiv โ CS AI ยท Mar 54/10
๐ง Researchers trained a compact 1.5B parameter language model to solve beam physics problems using reinforcement learning with verifiable rewards, achieving 66.7% improvement in accuracy. However, the model learned pattern-matching templates rather than true physics reasoning, failing to generalize to topological changes despite mastering the same underlying equations.
AINeutralarXiv โ CS AI ยท Mar 54/10
๐ง Researchers developed a multi-agent simulation framework using reinforcement learning to model archaeological mobility patterns in complex terrain. The system combines global path planning with local adaptation to simulate human and animal movement in historical landscapes, demonstrated through pursuit scenarios and transport analysis.
AINeutralarXiv โ CS AI ยท Mar 54/10
๐ง Researchers propose UrbanHuRo, a two-layer human-robot collaboration framework that jointly optimizes different urban services like delivery and sensing. The system demonstrated 29.7% improvement in sensing coverage and 39.2% increase in courier income while reducing overdue orders through coordinated optimization of heterogeneous services.
AINeutralarXiv โ CS AI ยท Mar 54/10
๐ง Researchers propose DSRM-HRL, a new framework that uses diffusion models to purify user preference data and hierarchical reinforcement learning to balance recommendation accuracy with fairness. The system addresses bias in interactive recommendation systems by separating state estimation from decision-making, achieving better outcomes on both utility and exposure equity.
AINeutralarXiv โ CS AI ยท Mar 54/10
๐ง A research paper analyzes reward functions used in reinforcement learning for autonomous driving, identifying gaps in current approaches. The study categorizes objectives into Safety, Comfort, Progress, and Traffic Rules compliance, highlighting limitations in objective aggregation and context awareness.
AINeutralarXiv โ CS AI ยท Mar 54/10
๐ง Researchers propose a standardized framework for classifying and evaluating memory capabilities in reinforcement learning agents, drawing from cognitive science concepts. The paper addresses confusion around memory terminology in RL and provides practical definitions for different memory types along with robust experimental methodologies.
AINeutralarXiv โ CS AI ยท Mar 54/10
๐ง Researchers present AutoQD, a new AI method that automatically discovers diverse behavioral policies without requiring hand-crafted descriptors. The approach uses mathematical embeddings of policy occupancy measures to enable Quality-Diversity optimization algorithms to find varied high-performing solutions in reinforcement learning tasks.
AINeutralarXiv โ CS AI ยท Mar 54/10
๐ง Researchers have developed Q-SVMPC, a new Model Predictive Control method that combines reinforcement learning with Stein variational inference to improve trajectory optimization. The approach addresses limitations in existing MPC methods that often converge to single solutions, instead maintaining diverse solution paths for better performance in robotics applications.
AINeutralarXiv โ CS AI ยท Mar 44/102
๐ง Researchers studied diffusion-based model predictive control in discrete domains using Tetris, finding that feasibility constraints are necessary and shorter planning horizons outperform longer ones. The study reveals structural challenges with discrete diffusion planners, particularly misalignment issues with DQN critics that produce high decision regret.
AINeutralarXiv โ CS AI ยท Mar 44/103
๐ง Researchers propose SAGE (Self-supervised Action Gating with Energies), a new method to improve diffusion planners in offline reinforcement learning by filtering out dynamically inconsistent trajectories. The approach uses a latent consistency signal to re-rank candidate actions at inference time, improving performance across locomotion, navigation, and manipulation tasks without requiring environment rollouts or policy retraining.
AINeutralarXiv โ CS AI ยท Mar 44/103
๐ง Researchers introduce a multi-agent collaboration framework for zero-shot document-level event argument extraction that uses AI agents to generate, evaluate, and refine synthetic training data. The system employs reinforcement learning to iteratively improve both data generation quality and argument extraction performance through a collaborative process.
AIBullisharXiv โ CS AI ยท Mar 44/102
๐ง Researchers propose Symbolic Reward Machines (SRMs) as an improvement over traditional Reward Machines in reinforcement learning, eliminating the need for manual user input while maintaining performance. SRMs process observations directly through symbolic formulas, making them more applicable to widely adopted RL frameworks.
AINeutralarXiv โ CS AI ยท Mar 44/103
๐ง Researchers propose HRL4PFG, a new interactive recommendation framework using hierarchical reinforcement learning to promote fairness by guiding user preferences toward long-tail items. The approach aims to balance item-side fairness with user satisfaction, showing improved performance in cumulative interaction rewards and user engagement length compared to existing methods.
AINeutralarXiv โ CS AI ยท Mar 44/102
๐ง Researchers developed AIGB-Pearl, a new AI-driven auto-bidding system that combines generative planning with policy optimization to improve advertising performance. The system addresses limitations of existing offline reinforcement learning methods by incorporating a trajectory evaluator and safe exploration mechanisms beyond static datasets.
AIBullisharXiv โ CS AI ยท Mar 35/105
๐ง Researchers propose Dual-Horizon Credit Assignment (DuCA), a new framework for optimizing large language models in industrial sales applications. The method addresses training instability by separately normalizing short-term linguistic rewards and long-term commercial rewards, achieving 6.82% improvement in conversion rates while reducing repetition and detection issues.
AIBullisharXiv โ CS AI ยท Mar 35/106
๐ง Researchers propose PGOS (Policy-Guided Outlier Synthesis), a new framework that uses reinforcement learning to improve Graph Neural Network safety by better detecting out-of-distribution graphs. The system replaces static sampling methods with a learned exploration strategy that navigates low-density regions to generate pseudo-OOD graphs for enhanced detector training.
AIBullisharXiv โ CS AI ยท Mar 35/105
๐ง Researchers developed PPO-LTL, a new framework that integrates Linear Temporal Logic safety constraints into Proximal Policy Optimization for safer reinforcement learning. The system uses Bรผchi automata to monitor safety violations and converts them into penalty signals, showing reduced safety violations while maintaining competitive performance in robotics environments.
AINeutralarXiv โ CS AI ยท Mar 35/107
๐ง Researchers developed SubstratumGraphEnv, a reinforcement learning framework that models Windows system attack paths using graph representations derived from Sysmon logs. The system combines Graph Convolutional Networks with Actor-Critic models to automate cybersecurity threat analysis and identify malicious process sequences.