AIBearisharXiv – CS AI · 10h ago7/10
🧠
The Gordian Knot for VLMs: Diagrammatic Knot Reasoning as a Hard Benchmark
Researchers unveiled KnotBench, a comprehensive benchmark testing vision-language models' ability to reason about knot diagrams, revealing that current models like Claude Opus and GPT-5 struggle fundamentally with spatial reasoning and symbolic operations despite perceiving visual details. The benchmark demonstrates a critical gap between perception and reasoning capabilities, with most tasks scoring near or below random chance, suggesting VLMs lack mechanisms to simulate geometric transformations.
🧠 GPT-5🧠 Claude🧠 Opus