#value-estimation News & Analysis

5 articles tagged with #value-estimation. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

5 articles

AIBullisharXiv – CS AI · Jun 237/10

🧠

ALOE: Action-Level Off-Policy Evaluation for Vision-Language-Action Model Post-Training

Researchers introduce ALOE, an off-policy evaluation framework designed to improve vision-language-action (VLA) models through better value function estimation from heterogeneous real-world data. The method addresses a critical challenge in robotic learning by enabling more accurate credit assignment and stable policy improvement across complex manipulation tasks.

AINeutralarXiv – CS AI · Jun 236/10

🧠

Calibration Is Not Control: Why LLM-Agent Oversight Needs Intervention

Researchers argue that current LLM agent oversight systems rely on flawed scalar risk prediction rather than intervention-aware decision-making. Their framework measures intervention advantage—the actual utility gain from intervening—and demonstrates that action-conditioned control significantly outperforms traditional calibrated risk scoring across multiple benchmarks.

AINeutralarXiv – CS AI · Jun 86/10

🧠

ViVa: A Video-Generative Value Model for Robot Reinforcement Learning

Researchers introduce ViVa, a video-generative value model that enhances robot reinforcement learning by predicting future proprioception and scalar values simultaneously. The approach achieves 80% success rates in manipulation tasks by grounding value estimation in anticipated embodiment dynamics, addressing limitations in existing vision-language models for long-horizon robotics applications.

AINeutralarXiv – CS AI · May 296/10

🧠

Hista and Numca: Estimate State Value Effectively for LLM Reinforcement Learning

Researchers introduce Hista and Numca, two novel techniques for improving state value estimation in large language model reinforcement learning. The work identifies a critical gap where standard RL approaches like PPO fail to accurately estimate state values, proposing solutions that leverage numerical spans and hidden state representations to enhance training stability and performance.

AINeutralarXiv – CS AI · May 116/10

🧠

Your Language Model is Its Own Critic: Reinforcement Learning with Value Estimation from Actor's Internal States

Researchers introduce POISE, a reinforcement learning method that uses a language model's internal hidden states to estimate baseline values for policy optimization, eliminating the computational overhead of separate critic models. The approach demonstrates comparable performance to existing methods while requiring significantly less compute, enabling more efficient training of large reasoning models.