y0news
← Feed
←Back to feed
🧠 AIβšͺ NeutralImportance 7/10

OOD-MMSafe: Advancing MLLM Safety from Harmful Intent to Hidden Consequences

arXiv – CS AI|Ming Wen, Kun Yang, Jingyu Zhang, Yuxuan Liu, shiwen cui, Shouling Ji, Xingjun Ma|
πŸ€–AI Summary

Researchers introduce OOD-MMSafe, a new benchmark revealing that current Multimodal Large Language Models fail to identify hidden safety risks up to 67.5% of the time. They developed CASPO framework which dramatically reduces failure rates to under 8% for risk identification in consequence-driven safety scenarios.

Key Takeaways
  • β†’Current MLLMs show significant 'causal blindness' with failure rates up to 67.5% in identifying latent hazards in context-dependent scenarios.
  • β†’The study introduces a shift from malicious intent detection to consequence-driven safety evaluation for AI models.
  • β†’OOD-MMSafe benchmark contains 455 curated query-image pairs to test models' ability to predict hidden safety consequences.
  • β†’CASPO framework reduces risk identification failure rates to 7.3% for Qwen2.5-VL-7B and 5.7% for Qwen3-VL-4B.
  • β†’Research highlights critical gaps in current AI safety alignment approaches as models become more capable.
Read Original β†’via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains β€” you keep full control of your keys.
Connect Wallet to AI β†’How it works
Related Articles