AIBearisharXiv – CS AI · 10h ago7/10
🧠
Pseudo-Deliberation in Language Models: When Reasoning Fails to Align Values and Actions
Researchers have identified a critical failure mode in large language models called 'pseudo-deliberation,' where LLMs appear to reason about their stated values but fail to align their actions accordingly. The study introduces VALDI, a framework measuring value-action gaps across 4,941 scenarios, and proposes VIVALDI, a multi-agent auditor to address misalignment in both proprietary and open-source models.