AINeutralarXiv – CS AI · 11h ago6/10
🧠
MAVRL: Learning Reward Functions from Multiple Feedback Types with Amortized Variational Inference
Researchers introduce MAVRL, a machine learning approach that learns reward functions from multiple heterogeneous feedback types (demonstrations, comparisons, ratings, stops) simultaneously using Bayesian inference and amortized variational inference. The method eliminates manual loss balancing and demonstrates superior performance compared to single-feedback approaches across discrete and continuous control benchmarks.