🧠 AI⚪ NeutralImportance 6/10

Can VLMs Reason Robustly? A Neuro-Symbolic Investigation

arXiv – CS AI|Weixin Chen, Antonio Vergari, Han Zhao|March 26, 2026 at 04:00 AM

🤖AI Summary

Researchers investigated whether Vision-Language Models (VLMs) can reason robustly under distribution shifts and found that fine-tuned VLMs achieve high accuracy in-distribution but fail to generalize. They propose VLC, a neuro-symbolic method combining VLM-based concept recognition with circuit-based symbolic reasoning that demonstrates consistent performance under covariate shifts.

Key Takeaways

→Fine-tuned VLMs achieve high in-distribution accuracy but fail to generalize under covariate shifts in visual reasoning tasks.
→Traditional gradient-based end-to-end training does not reliably induce underlying reasoning functions in VLMs.
→Recent neuro-symbolic approaches with black-box reasoning components still exhibit inconsistent robustness across tasks.
→The proposed VLC method decouples perception from reasoning by combining VLM concept recognition with circuit-based symbolic execution.
→VLC consistently achieves strong performance under covariate shifts across three distinct visual deductive reasoning tasks.