🧠 AI⚪ NeutralImportance 7/10

In LLM Reasoning, there is Irrationality on top of Value Misalignment

arXiv – CS AI|Kejiang Qian, Fengxiang He|June 23, 2026 at 04:00 AM

🤖AI Summary

Researchers identify 'rational value risk' in large language models, showing that even well-aligned LLMs fail to consistently maximize their intended values during reasoning tasks. The study across major models (Llama, GPT, DeepSeek) reveals that value alignment training alone cannot eliminate this reasoning gap, with performance highly dependent on inference-time strategies.

Analysis

This research addresses a critical gap between LLM training objectives and actual reasoning behavior. While considerable effort has been invested in aligning language models with human values during training, this study demonstrates that alignment doesn't guarantee optimal decision-making when models engage in complex reasoning. The concept of rational value risk quantifies how much utility a model loses by not pursuing the mathematically optimal reasoning strategy for its stated values. Researchers decomposed this gap into three sources: limited candidate responses, insufficient prompt variations, and imperfect verification mechanisms.

The findings carry substantial implications for AI reliability and deployment. Across multiple model families and benchmarks, the research confirms that rational value risk is systematic rather than anomalous. Notably, value alignment efforts during training reduce but cannot eliminate this risk, suggesting the problem operates at a different level than traditional alignment concerns. The strong sensitivity to inference-time reasoning strategies indicates that how models are prompted and guided during deployment significantly impacts their ability to achieve their trained objectives.

For AI developers and enterprise users, this highlights that post-deployment optimization remains critical even after rigorous alignment training. The observation that longer reasoning improves rationality with diminishing returns suggests scaling reasoning computation has practical limits. This research points toward the need for runtime value verification and adaptive reasoning strategies rather than relying solely on training-time alignment. Future work likely focuses on developing better verification mechanisms and reasoning frameworks that can more consistently bridge the gap between aligned values and rational execution.

Key Takeaways

→Even well-aligned LLMs exhibit systematic rational value risk—failing to optimize their stated values during reasoning tasks.
→Value alignment training reduces but cannot eliminate the gap between deployed and optimal reasoning strategies.
→Inference-time reasoning strategy selection strongly influences whether models achieve their trained objectives.
→The rational value risk stems from finite candidate responses, limited prompt sampling, and imperfect verification mechanisms.
→Longer reasoning improves model rationality with diminishing returns, suggesting practical limits to scaling reasoning compute.

Mentioned in AI

Models

GPT-5OpenAI

LlamaMeta

#llm-reasoning #value-alignment #rational-value-risk #inference-optimization #model-behavior #ai-safety #llm-evaluation

Read Original →via arXiv – CS AI

Act on this with AI

Stay ahead of the market.

Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.

Connect Wallet to AI →How it works

AIMay 6