#temporal-difference News & Analysis

6 articles tagged with #temporal-difference. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

6 articles

AINeutralarXiv – CS AI · Jun 46/10

🧠

Trace-Mediated Peak Bias: Bridging Temporal Credit Assignment and Cognitive Heuristics in Deep Reinforcement Learning

Researchers identify Trace-Mediated Peak Bias (TMPB), a systematic failure in deep reinforcement learning where agents irrationally prioritize high-magnitude reward spikes over trajectories with greater cumulative returns. This phenomenon mirrors the human Peak-End Rule cognitive bias and reveals how mathematical constraints in credit assignment systems naturally produce human-like value distortions, with adaptive optimizers offering a potential solution.

AINeutralarXiv – CS AI · May 295/10

🧠

Behavior-Induced Mirror-Prox Temporal-Difference Learning for Faster Off-Policy Prediction

Researchers propose STHTD-MP, a new machine learning algorithm that improves off-policy prediction by using behavior-policy information to optimize the geometry of gradient temporal-difference methods. The method demonstrates faster convergence than existing approaches like GTD2-MP under certain conditions, with theoretical guarantees and empirical validation on standard benchmarks.

AINeutralarXiv – CS AI · May 126/10

🧠

One for All: A Non-Linear Transformer can Enable Cross-Domain Generalization for In-Context Reinforcement Learning

Researchers propose a non-linear transformer architecture that enables reinforcement learning agents to generalize across different domains through in-context learning, establishing a theoretical connection between transformers and kernel-based temporal difference learning. By interpreting transformers as operators in Reproducing Kernel Hilbert Space, the work demonstrates that value functions from diverse domains can share a unified weight set, with MetaWorld experiments validating the approach.

AINeutralarXiv – CS AI · May 116/10

🧠

R-GTD: A Geometric Analysis of Gradient Temporal-Difference Learning in Singular Regimes

Researchers propose R-GTD, a regularized gradient temporal-difference learning algorithm that maintains convergence guarantees even when the feature interaction matrix becomes singular—a practical limitation in existing GTD methods. The geometric analysis provides explicit error bounds and addresses a key stability challenge in off-policy reinforcement learning with function approximation.

AINeutralarXiv – CS AI · May 76/10

🧠

Extending Differential Temporal Difference Methods for Episodic Problems

Researchers propose a generalization of differential temporal difference (TD) methods that extends their applicability from infinite-horizon to episodic reinforcement learning problems. By addressing how reward centering affects policy optimization in episodic settings, the work maintains theoretical guarantees while empirically demonstrating improved sample efficiency across multiple algorithms and environments.

AINeutralarXiv – CS AI · Mar 174/10

🧠

Chunk-Guided Q-Learning

Researchers introduce Chunk-Guided Q-Learning (CGQ), a new offline reinforcement learning algorithm that combines single-step and multi-step temporal difference learning approaches. The method achieves better performance on long-horizon tasks by reducing error accumulation while maintaining fine-grained value propagation, with theoretical guarantees and empirical validation on OGBench tasks.