Trace-Mediated Peak Bias: Bridging Temporal Credit Assignment and Cognitive Heuristics in Deep Reinforcement Learning
Researchers identify Trace-Mediated Peak Bias (TMPB), a systematic failure in deep reinforcement learning where agents irrationally prioritize high-magnitude reward spikes over trajectories with greater cumulative returns. This phenomenon mirrors the human Peak-End Rule cognitive bias and reveals how mathematical constraints in credit assignment systems naturally produce human-like value distortions, with adaptive optimizers offering a potential solution.