AINeutralarXiv โ CS AI ยท 5d ago6/104
๐ง
Cognitive models can reveal interpretable value trade-offs in language models
Researchers developed a framework using cognitive models from psychology to analyze value trade-offs in language models, revealing how AI systems balance competing priorities like politeness and directness. The study shows LLMs' behavioral profiles shift predictably when prompted to prioritize certain goals and are influenced by reasoning budgets and training dynamics.