Quantile Geometry Regularization for Distributional Reinforcement Learning
Researchers propose RQIQN, a new reinforcement learning method that improves quantile-based distributional RL by addressing distorted distribution estimates through Wasserstein distributionally robust optimization. The approach adds a lightweight correction to quantile targets that prevents distributional collapse while maintaining computational efficiency, demonstrating superior performance on navigation and Atari benchmarks.
This research addresses a fundamental problem in quantile-based distributional reinforcement learning: bootstrapped target quantiles can produce degenerate or distorted distribution estimates that compromise learning quality. The RQIQN method reformulates IQN loss as local empirical quantile estimation problems and applies Wasserstein distributionally robust optimization to each quantile slot, yielding a closed-form correction mechanism.
The contribution builds on years of progress in distributional RL, which seeks to learn complete return distributions rather than point estimates. Prior approaches using quantile regression have shown promise but suffer from collapse in estimated distributions when targets become misaligned. This work connects robust optimization theory with practical RL concerns, offering both theoretical grounding and computational elegance—the correction requires no additional sample reconstruction or objective modification.
The geometric regularization mechanism works through median antisymmetry to preserve risk-neutral behavior while monotonically enlarging quantile gaps, directly counteracting the distributional spread collapse observed in prior methods. This design preserves the underlying value objective's integrity while addressing a specific pathology in distribution estimation.
For practitioners developing RL agents, this represents an incremental but meaningful improvement in distributional methods. The lightweight nature means integration into existing IQN-based systems requires minimal computational overhead. Empirical validation on navigation tasks and Atari games demonstrates consistent gains over existing quantile-based approaches. The theoretical framing may also inspire related robustness improvements in other distributional RL variants.
- →RQIQN uses Wasserstein distributionally robust optimization to fix degenerate quantile distributions in IQN-based learning
- →The method applies fraction-dependent Bellman target corrections that prevent distributional collapse without changing the core objective
- →Geometric regularization preserves risk-neutral quantile averaging while enlarging quantile gaps to combat spread collapse
- →Implementation requires minimal computational overhead and no sample reconstruction, enabling easy integration into existing systems
- →Empirical results show consistent performance gains over prior quantile-based distributional RL methods across multiple domains