y0news
← Feed
←Back to feed
🧠 AIβšͺ NeutralImportance 5/10

VLM-RobustBench: A Comprehensive Benchmark for Robustness of Vision-Language Models

arXiv – CS AI|Rohit Saxena, Alessandro Suglia, Pasquale Minervini|
πŸ€–AI Summary

Researchers introduce VLM-RobustBench, a comprehensive benchmark testing vision-language models across 133 corrupted image settings. The study reveals that current VLMs are semantically strong but spatially fragile, with low-severity spatial distortions often causing more performance degradation than visually severe photometric corruptions.

Key Takeaways
  • β†’VLM-RobustBench evaluates vision-language models across 49 augmentation types and 133 corrupted image settings.
  • β†’Visual severity is a weak predictor of difficulty, with low-severity spatial perturbations often degrading performance more than severe photometric corruptions.
  • β†’Low-severity glass blur reduces MMBench accuracy by 8 percentage points on average across models.
  • β†’Geometric distortions like upsample and elastic transform cause the largest performance drops, reaching up to 34 percentage points.
  • β†’Current vision-language models demonstrate semantic strength but significant spatial fragility under real-world image distortions.
Read Original β†’via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains β€” you keep full control of your keys.
Connect Wallet to AI β†’How it works
Related Articles