AINeutralarXiv – CS AI · 18h ago5/10
🧠
Provably Efficient Personalized Multi-Objective Bandits with Proactive Conversational Queries
Researchers present MO-PQUCB, a novel algorithm for personalized multi-objective decision-making that combines conversational queries with bandit feedback to learn user preferences more efficiently. The method uses a Plackett-Luce choice model and shift-invariant regularization to overcome fundamental learning barriers, demonstrating improved regret scaling and robustness to corrupted preference signals compared to existing approaches.