Humans' ALMANAC: A Human Collaboration Dataset of Action-Level Mental Model Annotations for Agent Collaboration
Researchers introduce ALMANAC, a dataset of 2,987 annotated human collaboration actions designed to teach AI agents how to maintain mental models during teamwork. The dataset, built from the Map Task routing exercise, includes theory-informed annotations tracking participants' reasoning, partner intent perception, and shared goals—addressing a critical gap in training collaborative AI systems beyond task completion.
The emergence of large language model agents capable of multi-step reasoning and planning has positioned them as potential human collaborators, yet a fundamental capability remains underdeveloped: maintaining accurate mental models during interaction. ALMANAC addresses this gap by providing the first large-scale dataset with action-level annotations that capture not just what collaborators do, but why they do it and what they believe their partners intend. This work recognizes that effective collaboration requires continuous alignment on three dimensions: self-reasoning, perceived partner intentions, and shared objectives.
The dataset builds on decades of social science research using the Map Task, a classic dyadic routing exercise that naturally generates rich collaborative interactions. By formalizing mental model annotations at the action level, the researchers create a bridge between classical collaboration research and modern AI training methodologies. This approach contrasts sharply with current LLM optimization, which prioritizes task completion metrics over process-level competencies.
The benchmarking results across six LLMs reveal significant room for improvement in simulating human collaborative behavior and inferring underlying mental models. This work has immediate implications for developers building AI systems in customer service, healthcare coordination, and scientific research—domains where collaboration quality directly impacts outcomes. The dataset provides a foundation for training agents that understand not just task requirements but the collaborative context in which they operate.
Future work will likely expand ALMANAC with additional task domains and explore how mental model accuracy correlates with collaboration success metrics beyond task completion.
- →ALMANAC provides the first large-scale dataset with action-level mental model annotations for training collaborative AI agents.
- →Current LLM agents fail to develop collaborative competencies because they optimize for task completion rather than process-level alignment.
- →The dataset tracks three collaboration dimensions: self-reasoning, perceived partner intent, and shared goal understanding.
- →Benchmark results show significant gaps in LLMs' ability to simulate human collaboration behavior and infer mental models.
- →This work enables development of AI systems for collaboration-dependent domains like healthcare, customer service, and scientific research.