Agentic Knowledge Tracing: A Multi-Agent LLM Architecture for Stealth Assessment of Financial Literacy in Serious Games
Researchers developed Agentic BKT, a multi-agent LLM system that assesses financial literacy in educational games without disrupting gameplay. The architecture uses specialized AI agents to evaluate player decisions across four financial competency domains, demonstrating significantly higher predictive validity than single-LLM approaches when validated against 193 K-12 participants.
This research addresses a critical gap in educational technology: measuring learning outcomes authentically within interactive environments. The Agentic BKT pipeline represents a meaningful advance in stealth assessment methodology, where evaluation occurs seamlessly during gameplay rather than through traditional testing that breaks immersion and engagement. The multi-agent decomposition approach—assigning specialized agents to risk mitigation, investing, spending, and credit management—mirrors how domain experts naturally think about financial competency, enabling more nuanced behavioral analysis than monolithic AI systems.
The validation against OECD/INFE financial literacy frameworks grounds this work in recognized international standards, while the three-fold improvement in predictive validity over baseline LLM approaches demonstrates substantive technical progress. The correlation with learning gains and post-test scores (r = 0.333, p < 0.0001) while showing no correlation with pre-test scores provides strong evidence of discriminant validity—the system measures actual learning, not prior knowledge.
For EdTech developers and financial education stakeholders, this architecture offers a scalable path to implement evidence-based assessment in serious games. The research validates that behavioral signal processing through specialized agents can extract meaningful competency indicators from unstructured gameplay logs. This methodology could extend beyond financial literacy to other complex domains requiring multidimensional skill assessment.
Future implementations should address scalability to larger player populations and integration with broader learning management systems. The framework's reliance on event logging and LLM classification suggests dependency on consistent game design patterns, potentially limiting cross-platform applicability.
- →Multi-agent LLM architecture achieved 3x higher predictive validity than single-model baselines for financial literacy assessment in games
- →System demonstrated significant correlation with learning outcomes (r=0.333) while validating against OECD/INFE frameworks across 193 students
- →Domain-specific agent decomposition enabled nuanced behavioral analysis across four financial competency areas without disrupting gameplay
- →Stealth assessment methodology captures authentic learning signals from open-ended player decisions rather than explicit testing
- →Architecture's validated approach to hidden assessment could expand to other complex educational domains requiring multidimensional skill evaluation