AINeutralarXiv – CS AI · 3h ago6/10
🧠
TRACER: Turn-level Regret Matching with Inner Reinforcement Credit for Cooperative Multi-LLM Reasoning
Researchers introduce TRACER, a reinforcement learning framework that enables multiple large language models to collaborate effectively on reasoning tasks by learning when to speak and what to say through turn-level decision-making. The approach addresses key challenges in multi-agent AI systems including sparse rewards, computational inefficiency, and oscillating performance, demonstrating improvements across mathematical reasoning benchmarks.