#information-retrieval News & Analysis

100 articles tagged with #information-retrieval. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

100 articles

AINeutralarXiv – CS AI · Jun 196/10

🧠

ScholarQuest: A Taxonomy-Guided Benchmark for Agentic Academic Paper Search in Open Literature Environments

Researchers introduce ScholarQuest, a large-scale benchmark for evaluating AI agents that search academic papers using language models. The benchmark tests agents across 1,000+ computer science topics with four research intent types, revealing that current agentic methods significantly outperform basic retrieval but still achieve only 31-36% recall, exposing substantial performance gaps in AI-driven literature discovery.

AINeutralarXiv – CS AI · Jun 116/10

🧠

SkillJuror: Measuring How Agent Skill Organization Changes Runtime Behavior

Researchers introduce SkillJuror, a framework measuring how LLM agent skill organization affects runtime behavior independent of content. Testing Progressive Disclosure—a hierarchical skill structure—against flat baselines shows agents access 3.26x more resources and achieve 4.1% higher verification rates, revealing that procedural knowledge presentation meaningfully influences agent reasoning patterns.

AINeutralarXiv – CS AI · Jun 116/10

🧠

What Limits Does Quantization Place on Dense Top-$k$ Retrieval? A Theoretical Study

A theoretical study proves that quantization fundamentally limits dense top-k retrieval systems, requiring embedding dimension and precision to scale logarithmically with corpus size, contradicting prior corpus-independent bounds that assumed infinite precision. This finding has direct implications for practical vector databases and dense retrieval systems where quantization is standard practice.

AINeutralarXiv – CS AI · Jun 106/10

🧠

Agentic Hybrid RAG for Evidence-Grounded Muon Collider Analysis

Researchers present agentic hybrid RAG, a framework combining retrieval-augmented generation with agentic reasoning to improve scientific question answering in muon collider physics research. The work introduces the first benchmark for retrieval-augmented QA in high-energy physics, demonstrating that hybrid retrieval methods outperform traditional approaches for locating and synthesizing evidence from scientific literature.

AINeutralarXiv – CS AI · Jun 106/10

🧠

SkillResolve-Bench: Measuring and Resolving Same-Capability Ambiguity in Agent Skill Retrieval

Researchers introduce SkillResolve-Bench, a benchmark for evaluating agent skill retrieval systems that addresses the critical problem of selecting the correct skill variant when multiple capabilities are semantically similar. The benchmark includes 661 helper/risky skill pairs and proposes SkillResolve, a method that achieves safer procedural exposure by selecting appropriate skill representatives from capability families.

AINeutralarXiv – CS AI · Jun 106/10

🧠

LakeQA: An Exploratory QA Benchmark over a Million-Scale Data Lake

Researchers introduced LakeQA, a new benchmark dataset for evaluating large language models on question-answering tasks over massive data lakes containing 9.5TB of heterogeneous data. The benchmark reveals significant challenges in current LLMs, with GPT-5.2 achieving only 18.37% accuracy, highlighting the gap between reading-comprehension performance and real-world search-and-reasoning requirements.

🧠 GPT-5

AINeutralarXiv – CS AI · Jun 106/10

🧠

STORM: Stepwise Token Optimization with Reward-Guided Beam Search

Researchers introduce STORM, a self-supervised framework that optimizes lexical query expansion for information retrieval by using BM25 reward signals during generation. The approach enables smaller language models (0.6B-8B parameters) to match larger proprietary rewriters while maintaining BM25's speed efficiency, and demonstrates zero-shot transfer across 18 languages.