AINeutralarXiv β CS AI Β· 14h ago6/10
π§
Why Steering Works: Toward a Unified View of Language Model Parameter Dynamics
Researchers present a unified framework for understanding how different methods control large language modelsβincluding fine-tuning, LoRA, and activation interventionsβrevealing a fundamental trade-off between steering strength and output quality. The analysis explains this through an activation manifold perspective and introduces SPLIT, a new steering method that improves control while better preserving model coherence.