AINeutralarXiv – CS AI · 3h ago6/10
🧠
Can Segmentation Models Understand the World? Towards Proactive Affordance Reasoning via Visual Chain-of-Thought
Researchers introduce SegWorld, a segmentation model that uses visual chain-of-thought reasoning to understand scenes and segment object parts based on high-level intent rather than explicit target descriptions. The model proactively observes scenes, infers affordances, and maps user instructions to specific physical interaction points, outperforming baselines on intent-level tasks while matching them on traditional target-referential instructions.