y0news
← Feed
Back to feed
🧠 AI🟢 BullishImportance 6/10

DEX-AR: A Dynamic Explainability Method for Autoregressive Vision-Language Models

arXiv – CS AI|Walid Bousselham, Angie Boggust, Hendrik Strobelt, Hilde Kuehne|
🤖AI Summary

Researchers developed DEX-AR, a new explainability method for autoregressive Vision-Language Models that generates 2D heatmaps to understand how these AI systems make decisions. The method addresses challenges in interpreting modern VLMs by analyzing token-by-token generation and visual-textual interactions, showing improved performance across multiple benchmarks.

Key Takeaways
  • DEX-AR introduces dynamic head filtering to identify attention heads focused on visual information in autoregressive VLMs.
  • The method generates both per-token and sequence-level 2D heatmaps to explain model decision-making processes.
  • Traditional explainability methods designed for classification tasks struggle with modern autoregressive Vision-Language Models.
  • The approach distinguishes between visually-grounded and purely linguistic tokens during explanation generation.
  • Evaluation on ImageNet, VQAv2, and PascalVOC showed consistent improvements in both perturbation-based and segmentation-based metrics.
Mentioned in AI
Companies
Perplexity
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles