🧠 AI⚪ NeutralImportance 6/10

Context-Dependent Affordance Computation in Vision-Language Models

arXiv – CS AI|Murad Farzulla|March 6, 2026 at 05:00 AM

🤖AI Summary

Researchers found that vision-language models like Qwen-VL and LLaVA compute object affordances in highly context-dependent ways, with over 90% of scene descriptions changing based on contextual priming. The study reveals that these AI models don't have fixed understanding of objects but dynamically interpret them based on different situational contexts.

Key Takeaways

→Vision-language models show massive affordance drift with only 9.5% similarity between different context conditions when describing the same scenes.
→Over 90% of lexical scene descriptions change based on contextual priming, while 58.5% show semantic-level changes.
→The research suggests robotics should move toward dynamic, query-dependent ontological projection rather than static world modeling.
→Two stable latent factors were identified: a 'Culinary Manifold' for chef contexts and an 'Access Axis' for child-mobility scenarios.
→Context effects were confirmed to be genuine rather than random generation noise through controlled experiments.