AINeutralarXiv – CS AI · 15h ago6/10
🧠
From Static Context to Calibrated Interactive RL: Mitigating Distribution Shift in Multi-turn Dialogue with Aligned Simulator
Researchers propose Calibrated Interactive RL, a framework addressing distribution shift problems in multi-turn dialogue systems by combining interactive reinforcement learning with simulator alignment. The approach theoretically and empirically demonstrates that aligning simulators with human interaction patterns significantly improves LLM-based dialogue agent performance compared to static context and unaligned interactive methods.