Human Adults and LLMs as Scientists: Who Benefits from Active Exploration?
Research comparing human adults and large language models on causal learning tasks reveals that active exploration significantly improves humans' ability to identify conjunctive causal rules (where multiple causes must occur simultaneously), though conjunctive reasoning remains harder than disjunctive reasoning. State-of-the-art LLMs approach human performance on accuracy but demonstrate less efficient exploration strategies and similar reasoning gaps.
This cognitive science research extends decades of causal learning studies by introducing agency into experimental design. Previous findings showed adults struggle with conjunctive causal rules under passive observation, but this study demonstrates that when humans control evidence generation through active exploration, their conjunctive reasoning improves substantially. The research uses a modified blicket detector task—a standard paradigm in causal cognition research—to test both humans and multiple LLM architectures under identical conditions.
The work builds on foundational causal learning literature while addressing a critical limitation: most prior demonstrations relied on passive paradigms where subjects observed pre-generated evidence. By granting participants agency, researchers uncovered cognitive flexibility that passive observation masks. This finding has implications for understanding human learning mechanisms and suggests that interaction modality fundamentally shapes reasoning performance.
For AI development, the results reveal a performance ceiling in current LLMs compared to humans, particularly regarding exploration efficiency. While some state-of-the-art models achieve competitive accuracy rates, they fail to match human exploratory patterns and maintain similar conjunctive-disjunctive gaps. This indicates that scaling transformer architectures may not automatically solve reasoning tasks requiring strategic information-seeking behavior.
These findings suggest future research should examine whether LLMs can be improved through training on active learning paradigms rather than passive prediction tasks. The gap between human and model exploration strategies points toward architectural limitations in how LLMs generate hypotheses and design experiments. Understanding these differences could inform next-generation AI systems designed for scientific discovery and complex problem-solving.
- →Active exploration substantially improves human reasoning about conjunctive causal rules, contradicting passive observation findings
- →State-of-the-art LLMs achieve comparable accuracy to humans but employ less efficient exploration strategies
- →Both humans and current LLMs show persistent performance gaps between conjunctive and disjunctive causal reasoning
- →Interactive learning modalities may be critical for developing AI systems capable of scientific discovery
- →Human cognitive flexibility in agency-based settings suggests architectural differences LLMs have yet to replicate