AINeutralarXiv – CS AI · 6h ago6/10
🧠
CAPF: Guiding Search-Agent Rollouts with Credit-Attenuated Privileged Feedback
Researchers propose Credit-Attenuated Privileged Feedback (CAPF), a training mechanism that guides LLM search agents by providing verifier feedback during training to improve learning on difficult problems. The approach improves performance on open-domain QA benchmarks by leveraging information already available in reinforcement learning systems, increasing exact-match accuracy from 44.7% to 48.5% on Qwen3-4B.