AINeutralarXiv – CS AI · 14h ago6/10
🧠
MuPHI: Learning Implicit Multimodal Harm Reasoning via Semantically Grounded Reward Optimization
Researchers introduce MuPHI, a dataset and training framework for detecting implicit multimodal harm in image-text pairs where danger emerges from context-dependent reasoning rather than surface features. The proposed MuPHIRM framework uses reward optimization to improve vision-language models' ability to reason about compositional harm while demonstrating stronger generalization to out-of-distribution scenarios.