AIBullisharXiv – CS AI · 7h ago7/10
🧠
Toward Preference-aligned Large Language Models via Residual-based Model Steering
Researchers introduce PaLRS, a training-free method for aligning large language models with human preferences using lightweight steering vectors extracted from residual streams. The approach requires minimal data (100+ preference pairs) and achieves better performance than standard optimization methods like DPO with significantly lower computational costs.