y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#credit-assignment News & Analysis

44 articles tagged with #credit-assignment. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

44 articles
AINeutralarXiv – CS AI · Jun 16/10
🧠

The Terminal Representation in Reinforcement Learning

Researchers introduce the Terminal Representation (TR), a novel approach to representation learning in reinforcement learning that encodes reward-weighted trajectories more efficiently than existing methods. The TR achieves comparable performance to established approaches like the Default Representation while reducing computational overhead and eliminating assumptions about symmetric transition dynamics.

AINeutralarXiv – CS AI · May 296/10
🧠

Beyond Trajectory Rewards: Step-level Credit Assignment for Agentic Search via Graph Modeling

Researchers introduce Graph-Distance Contribution Reward (GDCR), a novel step-level credit assignment method for agentic search that evaluates individual agent actions by measuring progress toward answer nodes in knowledge graphs. Combined with Step Advantage Policy Optimization (SAPO), this approach improves upon trajectory-level reward systems that cannot assess the quality of intermediate steps, showing strong results across multiple benchmarks.

AINeutralarXiv – CS AI · May 296/10
🧠

Unifying Temporal and Structural Credit Assignment in LLM-Based Multi-Agent Prompt Optimization

Researchers propose a novel method for optimizing multi-agent LLM systems by decomposing credit assignment into temporal and structural components, enabling more efficient prompt optimization through targeted refinement rather than global updates. The approach uses state-space bottleneck analysis and role-based policy isolation to identify and fix weak components in collaborative AI systems, reducing computational queries while improving reasoning performance across benchmarks.

AIBullisharXiv – CS AI · May 296/10
🧠

Graph-Enhanced Policy Optimization in LLM Agent Training

Researchers present Graph-Enhanced Policy Optimization (GEPO), a new training framework for multi-step LLM agents that improves credit assignment by analyzing state-transition graphs and task relevance. The method achieves 1.1-3.8% performance gains across multiple benchmarks by differentiating the importance of individual steps and trajectories based on their structural and semantic roles.

AINeutralarXiv – CS AI · May 286/10
🧠

TRACER: Turn-level Regret Matching with Inner Reinforcement Credit for Cooperative Multi-LLM Reasoning

Researchers introduce TRACER, a reinforcement learning framework that enables multiple large language models to collaborate effectively on reasoning tasks by learning when to speak and what to say through turn-level decision-making. The approach addresses key challenges in multi-agent AI systems including sparse rewards, computational inefficiency, and oscillating performance, demonstrating improvements across mathematical reasoning benchmarks.

AINeutralarXiv – CS AI · May 276/10
🧠

StepOPSD: Step-Aware Online Preference Distillation for Agent Reinforcement Learning

StepOPSD introduces a novel reinforcement learning framework that improves credit assignment in multi-turn agent tasks by treating individual steps rather than entire trajectories as the unit of learning. The method achieves state-of-the-art results on benchmark tasks like ALFWorld and Search-QA, demonstrating that step-level preference distillation is particularly effective when trajectory rewards poorly correlate with individual decision quality.

AIBullisharXiv – CS AI · May 126/10
🧠

MemQ: Integrating Q-Learning into Self-Evolving Memory Agents over Provenance DAGs

Researchers introduce MemQ, a novel framework that applies Q-learning eligibility traces to episodic memory in large language model agents, enabling credit assignment across memory dependencies recorded in provenance DAGs. The approach achieves superior performance across six diverse benchmarks, with gains up to 5.7 percentage points on multi-step tasks requiring deep memory chains.

AINeutralarXiv – CS AI · May 126/10
🧠

PiCA: Pivot-Based Credit Assignment for Search Agentic Reinforcement Learning

Researchers introduce PiCA (Pivot-Based Credit Assignment), a novel reinforcement learning mechanism that improves how LLM-based search agents learn from long sequences of actions. By identifying key pivot steps and anchoring rewards to final task outcomes, PiCA addresses critical challenges in credit assignment, delivering 15.2% performance gains on knowledge-intensive QA tasks.

AINeutralarXiv – CS AI · May 126/10
🧠

Verifiable Process Rewards for Agentic Reasoning

Researchers introduce Verifiable Process Rewards (VPR), a framework that enhances reinforcement learning for large language models by providing dense, intermediate-level feedback during reasoning tasks rather than relying solely on sparse outcome-level rewards. The approach leverages symbolic, algorithmic, and probabilistic verification methods to improve credit assignment in long-horizon agentic reasoning, with theoretical and empirical validation across multiple benchmarks.

AINeutralarXiv – CS AI · May 126/10
🧠

EGL-SCA: Structural Credit Assignment for Co-Evolving Instructions and Tools in Graph Reasoning Agents

Researchers introduce EGL-SCA, a framework for graph reasoning agents that jointly optimizes both natural language instructions and computational tools through structural credit assignment. The system achieves 92.0% success rate on graph reasoning benchmarks by precisely routing failures to either prompt optimization or tool synthesis, outperforming isolated improvement approaches.

AINeutralarXiv – CS AI · May 116/10
🧠

Structured Role-Aware Policy Optimization for Multimodal Reasoning

Researchers introduce Structured Role-Aware Policy Optimization (SRPO), a reinforcement learning method that improves multimodal AI reasoning by assigning credit to different token types based on their functional roles. The approach enhances vision-language models' ability to ground answers in visual evidence without requiring external reward models, advancing more reliable multimodal reasoning systems.

AINeutralarXiv – CS AI · May 116/10
🧠

Learning CLI Agents with Structured Action Credit under Selective Observation

Researchers present a new approach to training CLI agents through reinforcement learning, introducing σ-Reveal for selective observation and A³ for credit assignment. The work addresses fundamental challenges in teaching AI systems to interact with command-line interfaces by leveraging structured action properties and proposing the ShellOps dataset for evaluation.

AINeutralarXiv – CS AI · May 116/10
🧠

In-Context Credit Assignment via the Core

Researchers propose a new mechanism for fairly distributing compensation among creators whose intellectual property appears in AI model context windows, using cooperative game theory's least core solution. The approach efficiently approximates fair value distribution while requiring significantly fewer computational resources than existing methods.

AINeutralarXiv – CS AI · May 116/10
🧠

Exact Is Easier: Credit Assignment for Cooperative LLM Agents

Researchers present C3, a novel credit assignment method for cooperative multi-agent LLM systems that achieves exact causal measurement without approximation by exploiting deterministic interaction histories. The method outperforms existing baselines across six benchmarks while reducing training costs, and introduces the first method-agnostic auditing tools for evaluating multi-agent credit assignment quality.

AINeutralarXiv – CS AI · May 96/10
🧠

Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning

Skill1 presents a unified reinforcement learning framework that enables language model agents to co-evolve three coupled capabilities: skill selection, utilization, and distillation from a single task-outcome reward signal. Demonstrated improvements over existing baselines on complex tasks suggest advances in how AI agents can build and leverage persistent skill libraries across diverse problem domains.

AINeutralarXiv – CS AI · May 96/10
🧠

Owen-Shapley Policy Optimization: A Principled RL Algorithm for Generative Search LLMs

Researchers introduce Owen-Shapley Policy Optimization (OSPO), a reinforcement learning algorithm that improves how language models learn from feedback by attributing credit to individual tokens rather than treating entire sequences as atomic units. The method addresses a fundamental training gap in generative AI systems used for recommendation tasks, showing measurable improvements on real e-commerce datasets.

AINeutralarXiv – CS AI · May 46/10
🧠

PORTool: Importance-Aware Policy Optimization with Rewarded Tree for Multi-Tool-Integrated Reasoning

PORTool is a new policy-optimization algorithm that improves how AI agents learn to use external tools by solving the credit-assignment problem in multi-step reasoning tasks. The method uses a rewarded tree structure to assign rewards at individual steps rather than only at outcomes, enabling agents to achieve higher accuracy while reducing unnecessary tool calls.

AINeutralarXiv – CS AI · Apr 106/10
🧠

Reason in Chains, Learn in Trees: Self-Rectification and Grafting for Multi-turn Agent Policy Optimization

Researchers propose T-STAR, a novel reinforcement learning framework that structures multi-step agent trajectories as trees rather than independent chains, enabling better credit assignment for LLM agents. The method uses tree-based reward propagation and surgical policy optimization to improve reasoning performance across embodied, interactive, and planning tasks.

AINeutralarXiv – CS AI · Mar 114/10
🧠

Cooperative Game-Theoretic Credit Assignment for Multi-Agent Policy Gradients via the Core

Researchers propose CORA, a new cooperative game-theoretic method for credit assignment in multi-agent reinforcement learning that uses coalition-wise advantage allocation. The approach addresses policy optimization challenges by evaluating marginal contributions of different agent coalitions and demonstrates superior performance across various benchmarks.

← PrevPage 2 of 2