AIBullisharXiv – CS AI · Apr 157/10
🧠Researchers introduce dual-trace memory encoding for LLM agents, pairing factual records with narrative scene reconstructions to improve cross-session recall by 20+ percentage points. The method significantly enhances temporal reasoning and multi-session knowledge aggregation without increasing computational costs, advancing the capability of persistent AI agent systems.
AINeutralarXiv – CS AI · Apr 157/10
🧠Researchers propose a cognitive diagnostic framework that evaluates large language models across fine-grained ability dimensions rather than aggregate scores, enabling targeted model improvement and task-specific selection. The approach uses multidimensional Item Response Theory to estimate abilities across 35 dimensions for mathematics and generalizes to physics, chemistry, and computer science with strong predictive accuracy.
AIBullisharXiv – CS AI · Apr 147/10
🧠Researchers introduce Zero-shot Visual World Models (ZWM), a computational framework inspired by how young children learn physical understanding from minimal data. The approach combines sparse prediction, causal inference, and compositional reasoning to achieve data-efficient learning, demonstrating that AI systems can match child development patterns while learning from single-child observational data.
AIBullisharXiv – CS AI · Apr 147/10
🧠Researchers demonstrate that robots equipped with minimal embodied sensorimotor capabilities learn numerical concepts significantly faster than vision-only systems, achieving 96.8% counting accuracy with 10% of training data. The embodied neural network spontaneously develops biologically plausible number representations matching human cognitive development, suggesting embodiment acts as a structural learning prior rather than merely an information source.
AIBearisharXiv – CS AI · Apr 77/10
🧠Research reveals that large language models like DeepSeek-V3.2, Gemini-3, and GPT-5.2 show rigid adaptation patterns when learning from changing environments, particularly struggling with loss-based learning compared to humans. The study found LLMs demonstrate asymmetric responses to positive versus negative feedback, with some models showing extreme perseveration after environmental changes.
🧠 GPT-5🧠 Gemini
AINeutralarXiv – CS AI · Apr 77/10
🧠Researchers found that large language models align with human brain activity during creative thinking tasks, with alignment increasing based on model size and idea originality. Different post-training approaches selectively reshape how LLMs align with creative versus analytical neural patterns in humans.
🧠 Llama
AINeutralarXiv – CS AI · Mar 117/10
🧠Researchers propose a new theoretical framework called the 'Third Entity' to describe the emergent cognitive formation that arises from human-AI interactions, introducing the concept of 'vibe-creation' as a pre-reflective cognitive mode. The paper argues this represents the automation of tacit knowledge with significant implications for epistemology, education, and how we understand human-AI collaboration.
AIBearisharXiv – CS AI · Mar 56/10
🧠Research comparing four state-of-the-art language models (GPT-5, Gemini 2.5 Pro, Claude Sonnet 4.5, and Centaur) to humans in goal selection tasks reveals substantial divergence in behavior. While humans explore diverse approaches and learn gradually, the AI models tend to exploit single solutions or show poor performance, raising concerns about using current LLMs as proxies for human decision-making in critical applications.
🧠 Claude🧠 Gemini
AINeutralarXiv – CS AI · Mar 56/10
🧠Researchers introduce LifeBench, a new AI benchmark that tests long-term memory systems by requiring integration of both declarative and non-declarative memory across extended timeframes. Current state-of-the-art memory systems achieve only 55.2% accuracy on this challenging benchmark, highlighting significant gaps in AI's ability to handle complex, multi-source memory tasks.
AIBullisharXiv – CS AI · Mar 46/104
🧠Researchers analyzed Meta's NLLB-200 neural machine translation model across 135 languages, finding that it has implicitly learned universal conceptual structures and language genealogical relationships. The study reveals the model creates language-neutral conceptual representations similar to how multilingual brains organize information, with semantic relationships preserved across diverse languages.
AINeutralarXiv – CS AI · Mar 37/104
🧠Researchers propose the Compression Efficiency Principle (CEP) to explain why artificial neural networks and biological brains develop similar representations despite different substrates. The theory suggests both systems converge on efficient compression strategies that encode stable invariants rather than unstable correlations, providing a unified framework for understanding intelligence across biological and artificial systems.
AINeutralarXiv – CS AI · 3d ago6/10
🧠Researchers introduce the Cognitive Categorical Transformer (CCT), a 306M-parameter language model that applies category-theoretic principles to improve upon GPT-2 Small, achieving 12% relative perplexity reduction on WikiText-103. The work provides empirical validation that simplicial message passing enhances language modeling performance and identifies a distinction between topology-adding versus consistency-enforcing categorical priors.
🏢 Perplexity
AINeutralarXiv – CS AI · 4d ago6/10
🧠Researchers propose a Cognitive Taxonomy framework to measure progress toward AGI by evaluating systems against 10 key cognitive faculties derived from psychology and neuroscience research. The framework aims to address the lack of standardized metrics for AGI advancement and provide empirical evaluation methods to support responsible AI governance.
AINeutralarXiv – CS AI · 4d ago6/10
🧠Researchers introduce Contextual Alternative Choice (CAC), a new evaluation method that measures both syntactic and functional properties of language models using metrics derived from child language acquisition studies. While some large language models approach human-level performance on these benchmarks, none trained on comparable data volumes simultaneously meet both formal and functional standards that children achieve early in development.
AINeutralarXiv – CS AI · 5d ago5/10
🧠This arXiv paper proposes the Sensation Modulating Network (SMN), a theoretical cognitive architecture that attempts to bridge the long-standing divide between cognitivism and embodied cognition approaches. The framework grounds meaning-making in the body's opponent dynamics and hierarchical action patterns, offering a novel perspective on how agents achieve intentional directedness without requiring additional computational modules.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers studying 21 large language models found a significant 'grounding gap' in how LLMs understand abstract concepts compared to humans. While LLMs rely heavily on word associations, they systematically underreproduce emotional and internal-state properties, achieving maximum correlation of r=0.37 versus human-to-human baselines above r=0.9. The findings suggest current models can identify grounding dimensions when explicitly queried but fail to recruit them naturally during free generation.
AINeutralarXiv – CS AI · May 125/10
🧠Researchers extend the COLIBRI fuzzy color model to reveal that human color categories exhibit significant perceptual asymmetry, with yellow forming a narrow, sharply-defined region while green spans a broader interval. This finding challenges computational models that assume uniformly distributed color representations and suggests color naming follows non-uniform geometric organization in perceptual space.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers present Bounded Pragmatic Listener (BPL), a Bayesian framework that models how cognitive limitations affect susceptibility to misinformation. The framework incorporates three cognitively grounded constraints—working memory limits, information bottlenecks, and saliency-weighted sampling—to predict vulnerability to disinformation across benchmark datasets.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers introduce the Developmental Sentence Completion Test (DSCT), a 20-item assessment tool that evaluates how large language models understand and reflect human developmental cognition based on Kegan's constructive-developmental theory. The study finds that frontier LLMs accurately identify developmental stages in simulated personas but show only fair agreement with real human responses, revealing that developmental signal is cleaner in synthetic data than human-generated text.
🏢 Meta
AINeutralarXiv – CS AI · May 126/10
🧠Researchers demonstrate that humans learn abstractions prospectively rather than retrospectively when facing non-stationary task environments. Using a visual program synthesis experiment called Pattern Builder Task, they show that human library learning anticipates future task structures rather than merely compressing past experience, a capability that existing algorithmic approaches and LLM-based models fail to replicate.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers propose a computational model that evaluates explanations by converting them into executable action plans through large language models and planning agents. Across four experiments with 1,200 explanations, higher-scored explanations correlate with improved navigation performance and user helpfulness judgments, demonstrating that explanation quality can be measured by practical outcomes under uncertainty.
AIBullisharXiv – CS AI · May 116/10
🧠Researchers compared frontier Large Reasoning Models (LRMs) with traditional AI systems using human gameplay data paired with fMRI brain recordings. LRMs demonstrated superior alignment with human learning behavior and predicted brain activity an order of magnitude better than reinforcement learning alternatives, suggesting they more closely mirror human cognition during complex decision-making.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers propose a Multi-Memory Segment System (MMS) that improves how AI agents generate and store long-term memories by moving beyond simple summarization. The system creates structured retrieval and contextual memory units inspired by cognitive psychology, enabling more effective historical data utilization and response quality in agent interactions.
AINeutralarXiv – CS AI · May 46/10
🧠Researchers compared how large language models, humans, and algorithms approach the exploration-exploitation tradeoff in multi-armed bandit decision-making tasks. The study finds that enabling thinking processes in LLMs makes them behave more like humans in simple environments, but LLMs fail to match human adaptability in complex, non-stationary settings despite similar regret outcomes.
AINeutralarXiv – CS AI · May 16/10
🧠Researchers introduce MEDS (Math Education Digital Shadows), a dataset of 28,000 personas from 14 LLMs designed to evaluate how language models reason about mathematics and report their confidence levels. The dataset integrates math proficiency with psychological measures like anxiety and self-efficacy, revealing that LLMs exhibit human-like biases including negative attitudes and overconfidence in mathematical reasoning.
🧠 Grok