AINeutralarXiv โ CS AI ยท 4h ago3
๐ง
Learning to maintain safety through expert demonstrations in settings with unknown constraints: A Q-learning perspective
Researchers propose SafeQIL, a new Q-learning algorithm that learns safe policies from expert demonstrations in constrained environments where safety constraints are unknown. The approach balances maximizing task rewards while maintaining safety by learning from demonstrated trajectories that successfully complete tasks without violating hidden constraints.