#neural-intervention News & Analysis

2 articles tagged with #neural-intervention. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles

AIBullisharXiv – CS AI · May 17/10

🧠

Debiasing Reward Models via Causally Motivated Inference-Time Intervention

Researchers propose a causally motivated method to reduce biases in reward models used for LLM alignment by identifying and suppressing neurons correlated with spurious features like response length. The technique achieves comparable performance to much larger models while editing less than 2% of neurons, suggesting biases are concentrated in early network layers.

AIBullisharXiv – CS AI · Apr 146/10

🧠

CoSToM:Causal-oriented Steering for Intrinsic Theory-of-Mind Alignment in Large Language Models

Researchers introduce CoSToM, a framework that uses causal tracing and activation steering to improve Theory of Mind alignment in large language models. The work addresses a critical gap between LLMs' internal knowledge and external behavior, demonstrating that targeted interventions in specific neural layers can enhance social reasoning capabilities and dialogue quality.