AIBullisharXiv โ CS AI ยท 14h ago7/10
๐ง
Bringing Value Models Back: Generative Critics for Value Modeling in LLM Reinforcement Learning
Researchers propose Generative Actor-Critic (GenAC), a new approach to value modeling in large language model reinforcement learning that uses chain-of-thought reasoning instead of one-shot scalar predictions. The method addresses a longstanding challenge in credit assignment by improving value approximation and downstream RL performance compared to existing value-based and value-free baselines.