AINeutralarXiv – CS AI · Apr 146/10
🧠
Why Steering Works: Toward a Unified View of Language Model Parameter Dynamics
Researchers present a unified framework for understanding how different methods control large language models—including fine-tuning, LoRA, and activation interventions—revealing a fundamental trade-off between steering strength and output quality. The analysis explains this through an activation manifold perspective and introduces SPLIT, a new steering method that improves control while better preserving model coherence.