AINeutralarXiv – CS AI · 9h ago7/10
🧠
Value Entanglement: Conflation Between Different Kinds of Good In (Some) Large Language Models
Researchers have identified a critical flaw in large language models where moral values inappropriately influence judgments about grammatical and economic quality. The study reveals that LLMs conflate different types of value rather than distinguishing them as humans do, a problem that can be partially fixed through targeted ablation of morality-related activation vectors.