←Back to feed
🧠 AI⚪ NeutralImportance 7/10
The ARC of Progress towards AGI: A Living Survey of Abstraction and Reasoning
🤖AI Summary
A comprehensive survey of 82 AI approaches to the ARC-AGI benchmark reveals consistent 2-3x performance drops across all paradigms when moving from version 1 to 2, with human-level reasoning still far from reach. While costs have fallen dramatically (390x in one year), AI systems struggle with compositional generalization, achieving only 13% on ARC-AGI-3 compared to near-perfect human performance.
Key Takeaways
- →All AI paradigms (program synthesis, neuro-symbolic, neural) show consistent 2-3x performance degradation from ARC-AGI-1 to ARC-AGI-2, indicating fundamental limitations in compositional generalization.
- →Current best AI performance reaches 93% on ARC-AGI-1 but drops to 68.8% on ARC-AGI-2 and only 13% on ARC-AGI-3, while humans maintain near-perfect accuracy across all versions.
- →Costs for AI reasoning tasks fell 390x in one year, from $4,500 per task to $12 per task, though this largely reflects reduced test-time parallelism rather than efficiency gains.
- →Smaller models (660M-8B parameters) achieve competitive results with trillion-scale models, supporting the thesis that intelligence is about skill-acquisition efficiency rather than raw scale.
- →ARC Prize 2025 winners required hundreds of thousands of synthetic examples to reach only 24% on ARC-AGI-2, confirming that AI reasoning remains heavily knowledge-bound.
Mentioned in AI
Models
GPT-5OpenAI
OpusAnthropic
#agi#artificial-intelligence#machine-learning#reasoning#benchmarks#performance#generalization#ai-research#cognitive-abilities
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Related Articles