y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#reasoning-gap News & Analysis

2 articles tagged with #reasoning-gap. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles
AIBearisharXiv – CS AI · May 127/10
🧠

The Gordian Knot for VLMs: Diagrammatic Knot Reasoning as a Hard Benchmark

Researchers unveiled KnotBench, a comprehensive benchmark testing vision-language models' ability to reason about knot diagrams, revealing that current models like Claude Opus and GPT-5 struggle fundamentally with spatial reasoning and symbolic operations despite perceiving visual details. The benchmark demonstrates a critical gap between perception and reasoning capabilities, with most tasks scoring near or below random chance, suggesting VLMs lack mechanisms to simulate geometric transformations.

🧠 GPT-5🧠 Claude🧠 Opus
AINeutralarXiv – CS AI · Apr 146/10
🧠

FinTrace: Holistic Trajectory-Level Evaluation of LLM Tool Calling for Long-Horizon Financial Tasks

Researchers introduced FinTrace, a benchmark dataset with 800 expert-annotated trajectories for evaluating how large language models perform financial tool-calling tasks. The study reveals that while frontier LLMs excel at selecting appropriate tools, they struggle significantly with information utilization and generating accurate final outputs, pointing to a critical reasoning gap that persists even after fine-tuning with preference optimization techniques.