#cognitive-science News & Analysis

40 articles tagged with #cognitive-science. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

40 articles

AIBullisharXiv – CS AI · Apr 157/10

🧠

Drawing on Memory: Dual-Trace Encoding Improves Cross-Session Recall in LLM Agents

Researchers introduce dual-trace memory encoding for LLM agents, pairing factual records with narrative scene reconstructions to improve cross-session recall by 20+ percentage points. The method significantly enhances temporal reasoning and multi-session knowledge aggregation without increasing computational costs, advancing the capability of persistent AI agent systems.

AINeutralarXiv – CS AI · Apr 157/10

🧠

Beyond Scores: Diagnostic LLM Evaluation via Fine-Grained Abilities

Researchers propose a cognitive diagnostic framework that evaluates large language models across fine-grained ability dimensions rather than aggregate scores, enabling targeted model improvement and task-specific selection. The approach uses multidimensional Item Response Theory to estimate abilities across 35 dimensions for mathematics and generalizes to physics, chemistry, and computer science with strong predictive accuracy.

AIBullisharXiv – CS AI · Apr 147/10

🧠

Zero-shot World Models Are Developmentally Efficient Learners

Researchers introduce Zero-shot Visual World Models (ZWM), a computational framework inspired by how young children learn physical understanding from minimal data. The approach combines sparse prediction, causal inference, and compositional reasoning to achieve data-efficient learning, demonstrating that AI systems can match child development patterns while learning from single-child observational data.

AIBullisharXiv – CS AI · Apr 147/10

🧠

Minimal Embodiment Enables Efficient Learning of Number Concepts in Robot

Researchers demonstrate that robots equipped with minimal embodied sensorimotor capabilities learn numerical concepts significantly faster than vision-only systems, achieving 96.8% counting accuracy with 10% of training data. The embodied neural network spontaneously develops biologically plausible number representations matching human cognitive development, suggesting embodiment acts as a structural learning prior rather than merely an information source.

AIBearisharXiv – CS AI · Apr 77/10

🧠

Comparative reversal learning reveals rigid adaptation in LLMs under non-stationary uncertainty

Research reveals that large language models like DeepSeek-V3.2, Gemini-3, and GPT-5.2 show rigid adaptation patterns when learning from changing environments, particularly struggling with loss-based learning compared to humans. The study found LLMs demonstrate asymmetric responses to positive versus negative feedback, with some models showing extreme perseveration after environmental changes.

🧠 GPT-5🧠 Gemini

AINeutralarXiv – CS AI · Apr 77/10

🧠

Large Language Models Align with the Human Brain during Creative Thinking

Researchers found that large language models align with human brain activity during creative thinking tasks, with alignment increasing based on model size and idea originality. Different post-training approaches selectively reshape how LLMs align with creative versus analytical neural patterns in humans.

🧠 Llama

AINeutralarXiv – CS AI · Mar 117/10

🧠

Vibe-Creation: The Epistemology of Human-AI Emergent Cognition

Researchers propose a new theoretical framework called the 'Third Entity' to describe the emergent cognitive formation that arises from human-AI interactions, introducing the concept of 'vibe-creation' as a pre-reflective cognitive mode. The paper argues this represents the automation of tacit knowledge with significant implications for epistemology, education, and how we understand human-AI collaboration.

AIBearisharXiv – CS AI · Mar 56/10

🧠

Language Model Goal Selection Differs from Humans' in an Open-Ended Task

Research comparing four state-of-the-art language models (GPT-5, Gemini 2.5 Pro, Claude Sonnet 4.5, and Centaur) to humans in goal selection tasks reveals substantial divergence in behavior. While humans explore diverse approaches and learn gradually, the AI models tend to exploit single solutions or show poor performance, raising concerns about using current LLMs as proxies for human decision-making in critical applications.

🧠 Claude🧠 Gemini

AINeutralarXiv – CS AI · Mar 56/10

🧠

LifeBench: A Benchmark for Long-Horizon Multi-Source Memory

Researchers introduce LifeBench, a new AI benchmark that tests long-term memory systems by requiring integration of both declarative and non-declarative memory across extended timeframes. Current state-of-the-art memory systems achieve only 55.2% accuracy on this challenging benchmark, highlighting significant gaps in AI's ability to handle complex, multi-source memory tasks.

AIBullisharXiv – CS AI · Mar 46/104

🧠

Universal Conceptual Structure in Neural Translation: Probing NLLB-200's Multilingual Geometry

Researchers analyzed Meta's NLLB-200 neural machine translation model across 135 languages, finding that it has implicitly learned universal conceptual structures and language genealogical relationships. The study reveals the model creates language-neutral conceptual representations similar to how multilingual brains organize information, with semantic relationships preserved across diverse languages.

AINeutralarXiv – CS AI · Mar 37/104

🧠

The Information-Theoretic Imperative: Compression and the Epistemic Foundations of Intelligence

Researchers propose the Compression Efficiency Principle (CEP) to explain why artificial neural networks and biological brains develop similar representations despite different substrates. The theory suggests both systems converge on efficient compression strategies that encode stable invariants rather than unstable correlations, providing a unified framework for understanding intelligence across biological and artificial systems.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

The Cognitive Categorical Transformer: Category-Theoretic Inductive Biases for Language Modeling

Researchers introduce the Cognitive Categorical Transformer (CCT), a 306M-parameter language model that applies category-theoretic principles to improve upon GPT-2 Small, achieving 12% relative perplexity reduction on WikiText-103. The work provides empirical validation that simplicial message passing enhances language modeling performance and identifies a distinction between topology-adding versus consistency-enforcing categorical priors.

🏢 Perplexity

AINeutralarXiv – CS AI · 4d ago6/10

🧠

Measuring Progress Toward AGI: A Cognitive Framework

Researchers propose a Cognitive Taxonomy framework to measure progress toward AGI by evaluating systems against 10 key cognitive faculties derived from psychology and neuroscience research. The framework aims to address the lack of standardized metrics for AGI advancement and provide empirical evaluation methods to support responsible AI governance.

AINeutralarXiv – CS AI · 4d ago6/10

🧠

Measuring Form and Function in Language Models

Researchers introduce Contextual Alternative Choice (CAC), a new evaluation method that measures both syntactic and functional properties of language models using metrics derived from child language acquisition studies. While some large language models approach human-level performance on these benchmarks, none trained on comparable data volumes simultaneously meet both formal and functional standards that children achieve early in development.

AINeutralarXiv – CS AI · 5d ago5/10

🧠

The Sensation Modulating Network:Haltability as the architectural ground for object-directed phenomenology

This arXiv paper proposes the Sensation Modulating Network (SMN), a theoretical cognitive architecture that attempts to bridge the long-standing divide between cognitivism and embodied cognition approaches. The framework grounds meaning-making in the body's opponent dynamics and hierarchical action patterns, offering a novel perspective on how agents achieve intentional directedness without requiring additional computational modules.

AINeutralarXiv – CS AI · May 126/10

🧠

The Grounding Gap: How LLMs Anchor the Meaning of Abstract Concepts Differently from Humans

Researchers studying 21 large language models found a significant 'grounding gap' in how LLMs understand abstract concepts compared to humans. While LLMs rely heavily on word associations, they systematically underreproduce emotional and internal-state properties, achieving maximum correlation of r=0.37 versus human-to-human baselines above r=0.9. The findings suggest current models can identify grounding dimensions when explicitly queried but fail to recruit them naturally during free generation.

AINeutralarXiv – CS AI · May 125/10

🧠

Perceptual Asymmetry Between Hue Categories: Evidence from Human Color Categorization

Researchers extend the COLIBRI fuzzy color model to reveal that human color categories exhibit significant perceptual asymmetry, with yellow forming a narrow, sharply-defined region while green spans a broader interval. This finding challenges computational models that assume uniformly distributed color representations and suggests color naming follows non-uniform geometric organization in perceptual space.

AINeutralarXiv – CS AI · May 126/10

🧠

A Cognitively Grounded Bayesian Framework for Misinformation Susceptibility

Researchers present Bounded Pragmatic Listener (BPL), a Bayesian framework that models how cognitive limitations affect susceptibility to misinformation. The framework incorporates three cognitively grounded constraints—working memory limits, information bottlenecks, and saliency-weighted sampling—to predict vulnerability to disinformation across benchmark datasets.

AINeutralarXiv – CS AI · May 126/10

🧠

Evaluating Developmental Cognition Capabilities of LLMs

Researchers introduce the Developmental Sentence Completion Test (DSCT), a 20-item assessment tool that evaluates how large language models understand and reflect human developmental cognition based on Kegan's constructive-developmental theory. The study finds that frontier LLMs accurately identify developmental stages in simulated personas but show only fair agreement with real human responses, revealing that developmental signal is cleaner in synthetic data than human-generated text.

🏢 Meta

AINeutralarXiv – CS AI · May 126/10

🧠

Prospective Compression in Human Abstraction Learning

Researchers demonstrate that humans learn abstractions prospectively rather than retrospectively when facing non-stationary task environments. Using a visual program synthesis experiment called Pattern Builder Task, they show that human library learning anticipates future task structures rather than merely compressing past experience, a capability that existing algorithmic approaches and LLM-based models fail to replicate.

AINeutralarXiv – CS AI · May 126/10

🧠

Effective Explanations Support Planning Under Uncertainty

Researchers propose a computational model that evaluates explanations by converting them into executable action plans through large language models and planning agents. Across four experiments with 1,200 explanations, higher-scored explanations correlate with improved navigation performance and user helpfulness judgments, demonstrating that explanation quality can be measured by practical outcomes under uncertainty.

AIBullisharXiv – CS AI · May 116/10

🧠

Reason to Play: Behavioral and Brain Alignment Between Frontier LRMs and Human Game Learners

Researchers compared frontier Large Reasoning Models (LRMs) with traditional AI systems using human gameplay data paired with fMRI brain recordings. LRMs demonstrated superior alignment with human learning behavior and predicted brain activity an order of magnitude better than reinforcement learning alternatives, suggesting they more closely mirror human cognition during complex decision-making.

AINeutralarXiv – CS AI · May 116/10

🧠

A Multi-Memory Segment System for Generating High-Quality Long-Term Memory Content in Agents

Researchers propose a Multi-Memory Segment System (MMS) that improves how AI agents generate and store long-term memories by moving beyond simple summarization. The system creates structured retrieval and contextual memory units inspired by cognitive psychology, enabling more effective historical data utilization and response quality in agent interactions.

AINeutralarXiv – CS AI · May 46/10

🧠

Comparing Exploration-Exploitation Strategies of LLMs and Humans: Insights from Standard Multi-armed Bandit Experiments

Researchers compared how large language models, humans, and algorithms approach the exploration-exploitation tradeoff in multi-armed bandit decision-making tasks. The study finds that enabling thinking processes in LLMs makes them behave more like humans in simple environments, but LLMs fail to match human adaptability in complex, non-stationary settings despite similar regret outcomes.

AINeutralarXiv – CS AI · May 16/10

🧠

Math Education Digital Shadows for facilitating learning with LLMs: Math performance, anxiety and confidence in simulated students and AIs

Researchers introduce MEDS (Math Education Digital Shadows), a dataset of 28,000 personas from 14 LLMs designed to evaluate how language models reason about mathematics and report their confidence levels. The dataset integrates math proficiency with psychological measures like anxiety and self-efficacy, revealing that LLMs exhibit human-like biases including negative attitudes and overconfidence in mathematical reasoning.

🧠 Grok

Page 1 of 2Next →