#q-learning News & Analysis

18 articles tagged with #q-learning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

18 articles

AIBullisharXiv – CS AI · May 127/10

🧠

Continuous Latent Contexts Enable Efficient Online Learning in Transformers

Researchers demonstrate that transformer models equipped with continuous latent context tokens can efficiently implement online learning algorithms without parameter updates. A small GPT-2-style model trained with this approach outperforms much larger language models on synthetic online prediction tasks, suggesting a promising architectural direction for adaptive AI systems.

AIBullisharXiv – CS AI · May 77/10

🧠

When Life Gives You BC, Make Q-functions: Extracting Q-values from Behavior Cloning for On-Robot Reinforcement Learning

Researchers introduce Q2RL, a novel algorithm that combines behavior cloning with reinforcement learning to enable robots to improve their policies through online interaction. The method uses Q-value estimation and gating mechanisms to prevent policy degradation from distribution mismatch, achieving 100% success rates on complex manipulation tasks in 1-2 hours of real robot learning.

AINeutralarXiv – CS AI · Mar 47/103

🧠

Benefits and Pitfalls of Reinforcement Learning for Language Model Planning: A Theoretical Perspective

New research provides theoretical analysis of reinforcement learning's impact on Large Language Model planning capabilities, revealing that RL improves generalization through exploration while supervised fine-tuning may create spurious solutions. The study shows Q-learning maintains output diversity better than policy gradient methods, with findings validated on real-world planning benchmarks.

AIBullisharXiv – CS AI · Jun 236/10

🧠

Inverting the Bellman Equation: From $Q$-Values to World Models

Researchers demonstrate that value-based reinforcement learning agents trained on diverse reward functions implicitly encode accurate world models, bridging the traditional divide between model-free and model-based RL. They introduce P-learning, a method to extract these hidden environment models from Q-values, and show agents develop generalizable dynamics understanding beyond their training objectives.

AINeutralarXiv – CS AI · Jun 236/10

🧠

Reinforcement Learning for Long-Horizon Unordered Tasks: From Boolean to Coupled Reward Machines

Researchers introduce coupled reward machines (CRMs) and the QCoRM algorithm to improve reinforcement learning efficiency for long-horizon tasks with unordered subtasks. The approach scales exponentially better than existing methods by using compact reward representations and task decomposition, with validation across discrete and continuous environments.

AINeutralarXiv – CS AI · Jun 195/10

🧠

Robust $Q$-learning for mean-field control under Wasserstein uncertainty in common noise

Researchers have developed a robust Q-learning algorithm for mean-field control problems that handles uncertainty in common noise using Wasserstein distance methods. The algorithm combines quantization-projection schemes with dual reformulation and demonstrates convergence guarantees with finite-time bounds, validated through systemic risk and epidemic modeling simulations.

AINeutralarXiv – CS AI · Jun 105/10

🧠

Geometrically Averaged Hard Target Updates for Linear Q-Learning

Researchers introduce λ-target updates, a novel mechanism that geometrically averages periodic hard target updates in linear Q-learning to improve stability. This theoretical advancement bridges traditional periodic updates and continuous projected Q-value iteration, with potential applications in reinforcement learning optimization.

AINeutralarXiv – CS AI · May 296/10

🧠

Return-to-Go Is More Than a Number: Q-Guided Alignment for Return-Conditioned Supervised Learning

Researchers introduce Q-ALIGN DT, a machine learning framework that improves return-conditioned supervised learning by aligning return-to-go signals with actual policy performance using Q-value guidance. The method demonstrates superior controllability and generalization across reinforcement learning benchmarks, potentially advancing AI decision-making systems.

AIBullisharXiv – CS AI · May 126/10

🧠

MemQ: Integrating Q-Learning into Self-Evolving Memory Agents over Provenance DAGs

Researchers introduce MemQ, a novel framework that applies Q-learning eligibility traces to episodic memory in large language model agents, enabling credit assignment across memory dependencies recorded in provenance DAGs. The approach achieves superior performance across six diverse benchmarks, with gains up to 5.7 percentage points on multi-step tasks requiring deep memory chains.

AINeutralarXiv – CS AI · Mar 27/1013

🧠

Learning to maintain safety through expert demonstrations in settings with unknown constraints: A Q-learning perspective

Researchers propose SafeQIL, a new Q-learning algorithm that learns safe policies from expert demonstrations in constrained environments where safety constraints are unknown. The approach balances maximizing task rewards while maintaining safety by learning from demonstrated trajectories that successfully complete tasks without violating hidden constraints.

AINeutralarXiv – CS AI · Feb 275/104

🧠

QSIM: Mitigating Overestimation in Multi-Agent Reinforcement Learning via Action Similarity Weighted Q-Learning

Researchers propose QSIM, a new framework that addresses systematic Q-value overestimation in multi-agent reinforcement learning by using action similarity weighted Q-learning instead of traditional greedy approaches. The method demonstrates improved performance and stability across various value decomposition algorithms through similarity-weighted target calculations.

$NEAR

AINeutralarXiv – CS AI · Apr 145/10

🧠

Enhanced-FQL($\lambda$), an Efficient and Interpretable RL with novel Fuzzy Eligibility Traces and Segmented Experience Replay

Researchers propose Enhanced-FQL(λ), a fuzzy reinforcement learning framework that combines fuzzified eligibility traces and segmented experience replay to improve interpretability and efficiency in continuous control tasks. The method demonstrates competitive performance with neural network approaches while maintaining computational simplicity through interpretable fuzzy rule bases rather than complex black-box architectures.

$FET

AINeutralarXiv – CS AI · Mar 174/10

🧠

Chunk-Guided Q-Learning

Researchers introduce Chunk-Guided Q-Learning (CGQ), a new offline reinforcement learning algorithm that combines single-step and multi-step temporal difference learning approaches. The method achieves better performance on long-horizon tasks by reducing error accumulation while maintaining fine-grained value propagation, with theoretical guarantees and empirical validation on OGBench tasks.

AINeutralarXiv – CS AI · Mar 24/106

🧠

Pessimistic Auxiliary Policy for Offline Reinforcement Learning

Researchers developed a new pessimistic auxiliary policy for offline reinforcement learning that reduces error accumulation by sampling more reliable actions. The approach maximizes the lower confidence bound of Q-functions to avoid high-value actions with potentially high errors during training.

AINeutralarXiv – CS AI · Mar 24/106

🧠

Bridging the Performance Gap Between Target-Free and Target-Based Reinforcement Learning

Researchers introduce iterated Shared Q-Learning (iS-QL), a new reinforcement learning method that bridges target-free and target-based approaches by using only the last linear layer as a target network while sharing other parameters. The technique achieves comparable performance to traditional target-based methods while maintaining the memory efficiency of target-free approaches.

AINeutralHugging Face Blog · May 183/105

🧠

An Introduction to Q-Learning Part 1

This appears to be an educational article introducing Q-Learning, a reinforcement learning algorithm commonly used in AI and machine learning applications. However, the article body content was not provided for analysis.

AINeutralHugging Face Blog · May 202/106

🧠

An Introduction to Q-Learning Part 2/2

The article appears to be the second part of an educational series on Q-Learning, a reinforcement learning algorithm. However, the article body is empty, preventing detailed analysis of the content and implications.

AINeutralOpenAI News · Apr 211/107

🧠

Equivalence between policy gradients and soft Q-learning

The article appears to discuss a theoretical equivalence between policy gradient methods and soft Q-learning in reinforcement learning. However, the article body is empty, making detailed analysis impossible.