y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#credit-assignment News & Analysis

5 articles tagged with #credit-assignment. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

5 articles
AIBullisharXiv โ€“ CS AI ยท Apr 147/10
๐Ÿง 

Bringing Value Models Back: Generative Critics for Value Modeling in LLM Reinforcement Learning

Researchers propose Generative Actor-Critic (GenAC), a new approach to value modeling in large language model reinforcement learning that uses chain-of-thought reasoning instead of one-shot scalar predictions. The method addresses a longstanding challenge in credit assignment by improving value approximation and downstream RL performance compared to existing value-based and value-free baselines.

AIBullisharXiv โ€“ CS AI ยท Apr 107/10
๐Ÿง 

Not All Tokens See Equally: Perception-Grounded Policy Optimization for Large Vision-Language Models

Researchers introduce Perception-Grounded Policy Optimization (PGPO), a novel fine-tuning framework that improves how large vision-language models learn from visual inputs by strategically allocating learning signals to vision-dependent tokens rather than treating all tokens equally. Testing on the Qwen2.5-VL series demonstrates an average 18.7% performance boost across multimodal reasoning benchmarks.

AIBullisharXiv โ€“ CS AI ยท Mar 117/10
๐Ÿง 

Hindsight Credit Assignment for Long-Horizon LLM Agents

Researchers introduced HCAPO, a new framework that uses hindsight credit assignment to improve Large Language Model agents' performance in long-horizon tasks. The system leverages LLMs as post-hoc critics to refine decision-making, achieving 7.7% and 13.8% improvements over existing methods on WebShop and ALFWorld benchmarks respectively.

AINeutralarXiv โ€“ CS AI ยท Apr 106/10
๐Ÿง 

Reason in Chains, Learn in Trees: Self-Rectification and Grafting for Multi-turn Agent Policy Optimization

Researchers propose T-STAR, a novel reinforcement learning framework that structures multi-step agent trajectories as trees rather than independent chains, enabling better credit assignment for LLM agents. The method uses tree-based reward propagation and surgical policy optimization to improve reasoning performance across embodied, interactive, and planning tasks.

AINeutralarXiv โ€“ CS AI ยท Mar 114/10
๐Ÿง 

Cooperative Game-Theoretic Credit Assignment for Multi-Agent Policy Gradients via the Core

Researchers propose CORA, a new cooperative game-theoretic method for credit assignment in multi-agent reinforcement learning that uses coalition-wise advantage allocation. The approach addresses policy optimization challenges by evaluating marginal contributions of different agent coalitions and demonstrates superior performance across various benchmarks.