AIBullisharXiv โ CS AI ยท 14h ago6/10
๐ง
CoSToM:Causal-oriented Steering for Intrinsic Theory-of-Mind Alignment in Large Language Models
Researchers introduce CoSToM, a framework that uses causal tracing and activation steering to improve Theory of Mind alignment in large language models. The work addresses a critical gap between LLMs' internal knowledge and external behavior, demonstrating that targeted interventions in specific neural layers can enhance social reasoning capabilities and dialogue quality.