y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#benchmark-critique News & Analysis

1 article tagged with #benchmark-critique. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

1 articles
AIBearisharXiv – CS AI · 15h ago7/10
🧠

Composition Collapse: Stable Factual Knowledge Does Not Imply Compositional Reasoning

Researchers reveal that AI models can possess stable factual knowledge while failing dramatically at compositional reasoning—assembling facts into logical chains—a problem invisible to standard benchmark metrics. The study introduces a diagnostic protocol showing post-training improvements mask directional shifts in composition capability, with failures often rooted in generation-time constraints rather than fundamental model limitations.