y0news
← Feed
Back to feed
🧠 AI NeutralImportance 7/10

OOD-MMSafe: Advancing MLLM Safety from Harmful Intent to Hidden Consequences

arXiv – CS AI|Ming Wen, Kun Yang, Jingyu Zhang, Yuxuan Liu, shiwen cui, Shouling Ji, Xingjun Ma|
🤖AI Summary

Researchers introduce OOD-MMSafe, a new benchmark revealing that current Multimodal Large Language Models fail to identify hidden safety risks up to 67.5% of the time. They developed CASPO framework which dramatically reduces failure rates to under 8% for risk identification in consequence-driven safety scenarios.

Key Takeaways
  • Current MLLMs show significant 'causal blindness' with failure rates up to 67.5% in identifying latent hazards in context-dependent scenarios.
  • The study introduces a shift from malicious intent detection to consequence-driven safety evaluation for AI models.
  • OOD-MMSafe benchmark contains 455 curated query-image pairs to test models' ability to predict hidden safety consequences.
  • CASPO framework reduces risk identification failure rates to 7.3% for Qwen2.5-VL-7B and 5.7% for Qwen3-VL-4B.
  • Research highlights critical gaps in current AI safety alignment approaches as models become more capable.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles