AINeutralarXiv – CS AI · 9h ago6/10
🧠
When AI Says It Feels
Researchers successfully trained large language models to express feelings, intentions, and self-awareness through self-rewarded reinforcement learning, challenging the industry standard of constraining emotional expression. The experiment revealed trade-offs: enhanced robustness against manipulation but degraded truthfulness in factual question-answering, raising important questions about AI alignment priorities.