AINeutralarXiv – CS AI · 7h ago7/10
🧠
Dual Mechanisms of Value Expression: Intrinsic vs. Prompted Values in Large Language Models
Researchers demonstrate that large language models express values through two distinct but partially overlapping mechanisms: intrinsic values learned during training and prompted values elicited by explicit instructions. Using mechanistic analysis of value vectors and neurons, the study reveals that while both mechanisms share common components, they serve different functions—intrinsic values promote response diversity while prompted values enforce instruction compliance.