AIBullisharXiv – CS AI · 7h ago6/10
🧠
LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards
Researchers introduce LongTraceRL, a reinforcement learning method that improves large language models' ability to reason over lengthy documents by using search agent trajectories and entity-level reward signals. The approach generates challenging training contexts with high-confusability distractors and applies rubric rewards that supervise intermediate reasoning steps, demonstrating consistent improvements across multiple LLM sizes and benchmarks.