#qa-benchmarks News & Analysis

2 articles tagged with #qa-benchmarks. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles

AIBullisharXiv – CS AI · May 277/10

🧠

Search-E1: Self-Distillation Drives Self-Evolution in Search-Augmented Reasoning

Search-E1 introduces a simplified self-evolution method for search-augmented reasoning agents that achieves competitive performance through vanilla GRPO and self-distillation, without external supervision or complex auxiliary systems. The approach reaches 0.440 average EM on QA benchmarks with Qwen2.5-3B, demonstrating that elaborate post-training machinery may be unnecessary for effective agent development.

AINeutralarXiv – CS AI · Jun 26/10

🧠

CAPF: Guiding Search-Agent Rollouts with Credit-Attenuated Privileged Feedback

Researchers propose Credit-Attenuated Privileged Feedback (CAPF), a training mechanism that guides LLM search agents by providing verifier feedback during training to improve learning on difficult problems. The approach improves performance on open-domain QA benchmarks by leveraging information already available in reinforcement learning systems, increasing exact-match accuracy from 44.7% to 48.5% on Qwen3-4B.