🧠 AI⚪ NeutralImportance 7/10

Uneven Evolution of Cognition Across Generations of Generative AI Models

arXiv – CS AI|Isaac Galatzer-Levy, Daniel McDuff, Xin Liu, Jed McGiffin|May 11, 2026 at 04:00 AM

🤖AI Summary

Researchers have developed a psychometric framework to evaluate generative AI models' cognitive abilities across generations, revealing profound imbalances in their intelligence architecture. While leading multimodal models excel at verbal comprehension and working memory (>98th percentile), they severely lag in perceptual reasoning (<1st percentile), indicating that scaling alone cannot achieve human-like general intelligence.

Analysis

This research introduces a critical measurement framework for AI cognition that moves beyond task-specific benchmarks. The study's finding that generative models possess radically uneven cognitive profiles—excelling in language-based reasoning while nearly failing at visual-perceptual tasks—has significant implications for AI development strategies. The architectural bias toward linguistic processing suggests that current scaling approaches may be hitting fundamental limitations that cannot be overcome through simply adding more parameters or data.

The development of the AIQ Benchmark to track performance across model generations provides the first longitudinal view of cognitive evolution in AI systems. This context matters because it challenges the prevailing assumption in the industry that continued model scaling automatically produces more balanced, human-like intelligence. The asymmetric improvements across AI model families demonstrate that different cognitive domains mature at vastly different rates, with abstract quantitative reasoning advancing rapidly in linguistic contexts but visual reasoning stagnating.

For developers and researchers pursuing artificial general intelligence, these findings suggest that architectural redesigns may be necessary rather than incremental improvements to existing paradigms. The discovery that language-presented tasks advance faster than visually-presented equivalents indicates that multimodal models require fundamental rebalancing. Organizations investing heavily in scaling language models without addressing perceptual reasoning may find themselves building increasingly lopsided systems. This research provides empirical evidence that achieving balanced, human-like cognition in AI requires deliberate architectural innovations beyond current optimization techniques.

Key Takeaways

→Leading multimodal AI models show extreme cognitive imbalances, excelling at verbal tasks (>98th percentile) while failing at visual reasoning (<1st percentile)
→The newly developed AIQ Benchmark reveals asymmetric performance gains across six AI model generations, with language-based reasoning advancing faster than visual tasks
→Scaling and optimization alone appear insufficient to overcome fundamental architectural limitations in achieving balanced artificial general intelligence
→Models demonstrate architectural bias toward language-based symbolic manipulation over visual-perceptual processing
→Current AI development approaches may require fundamental redesigns rather than incremental improvements to achieve human-like cognitive balance