y0news
← Feed
Back to feed
🧠 AI🟢 BullishImportance 6/10

Critic-R: Improving Agentic Search using Instruction-tuned Retrievers with Natural Language Introspective Feedback

arXiv – CS AI|Md Zarif Ul Alam, Alireza Salemi, Hamed Zamani|
🤖AI Summary

Researchers introduce Critic-R, a framework that improves agentic search systems by creating a feedback loop between reasoning agents and retrieval models. The approach uses a critic model to evaluate whether retrieved context supports reasoning steps and includes two mechanisms: Critic-R-Zero for query refinement at inference time, and Critic-Embed for training retrievers without manual annotations, demonstrating significant improvements on multi-hop question-answering benchmarks.

Analysis

Critic-R addresses a fundamental challenge in agentic AI systems: the gap between how retrieval models are traditionally optimized and how they're actually used in reasoning pipelines. Conventional retrieval optimization relies on relevance judgments for isolated queries, but agentic systems require retrievers that understand the broader reasoning context and iteratively support multi-step problem-solving. This mismatch has forced developers to either co-train components end-to-end or manually annotate large datasets—both expensive and impractical at scale.

The framework's dual mechanisms represent a pragmatic solution to this constraint. Critic-R-Zero operates at inference time, allowing systems to refine queries and retrieval instructions based on the agent's own introspective feedback about whether retrieved evidence suffices. Critic-Embed extends this insight to training by automatically labeling successful and failed refinement trajectories as supervision signals. This eliminates the need for human relevance annotations, dramatically reducing deployment friction.

For the broader AI infrastructure landscape, this work demonstrates that retriever optimization can shift from static annotation-dependent models toward dynamic, self-improving systems grounded in agent behavior. Results across HotpotQA, 2WikiMultihopQA, MuSiQue, and Bamboogle show measurable gains in both retrieval quality and final answer accuracy, validating the approach across diverse reasoning tasks. Organizations building retrieval-augmented generation (RAG) systems, particularly those supporting complex multi-step reasoning, stand to benefit from similar feedback-loop architectures.

Key Takeaways
  • Critic-R creates explicit feedback loops between reasoning agents and retrieval models to improve multi-hop question answering without manual relevance annotations.
  • Critic-R-Zero enables runtime query refinement based on introspective agent feedback about whether retrieved context supports the next reasoning step.
  • Critic-Embed automatically generates supervision signals from successful and failed refinement trajectories, eliminating costly manual annotation requirements.
  • The framework demonstrates significant improvements in both retrieval quality and downstream answer accuracy across four major multi-hop QA benchmarks.
  • The approach reduces the deployment friction of agentic search by enabling self-improving retriever optimization through agent-driven feedback.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles