y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#information-retrieval News & Analysis

57 articles tagged with #information-retrieval. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

57 articles
AIBullisharXiv – CS AI · 2d ago7/10
🧠

OmniRetrieval: Unified Retrieval across Heterogeneous Knowledge Sources

OmniRetrieval is a new framework that enables unified retrieval across heterogeneous knowledge sources—including unstructured text, relational databases, knowledge graphs, and property graphs—by translating natural language queries into source-native queries rather than forcing all data into a homogenized format. The system demonstrates superior performance compared to single-source retrievers across 13 datasets and 309 knowledge bases, positioning it as a general-purpose interface that preserves the structural advantages of each knowledge source.

AIBullisharXiv – CS AI · 2d ago7/10
🧠

No More K-means:Single-Stage Sparse Coding for Efficient Multi-Vector Retrieval

Researchers introduce Single-stage Sparse Retrieval (SSR), a new approach that replaces clustering-based compression with sparse autoencoders for multi-vector retrieval systems. The method achieves 15x faster indexing, 50% lower retrieval latency, and improved accuracy compared to ColBERTv2, addressing critical efficiency bottlenecks in large-scale information retrieval.

AIBearisharXiv – CS AI · 3d ago7/10
🧠

LiveBrowseComp: Are Search Agents Searching, or Just Verifying What They Already Know?

Researchers reveal that LLM-based search agents often rely on intrinsic knowledge rather than genuinely searching the web, with up to 44.5% of answers generated without tool use. The new LiveBrowseComp benchmark, designed to test agents on recent facts within 90 days, shows all evaluated agents drop below 2% accuracy and exposes fundamental limitations in current search-augmented AI evaluation.

🏢 Hugging Face
AIBullisharXiv – CS AI · 4d ago7/10
🧠

ICICLE: Expanding Retrieval with In-Context Documents

Researchers introduce ICICLE, a generative retrieval framework that addresses the inefficiency of traditional corpus expansion by treating new documents as in-context evidence rather than requiring model retraining. The approach uses a copy-based routing mechanism to distinguish between parametric memory and context-provided document associations, achieving better scalability without catastrophic forgetting.

AIBullishGoogle AI Blog · May 197/10
🧠

A new era for AI Search

A major technology company announced a significant advancement in search technology by integrating artificial intelligence capabilities with traditional search engine functionality. This development represents a strategic shift toward hybrid search solutions that combine AI's generative and analytical strengths with search engines' indexing and retrieval capabilities.

A new era for AI Search
AIBullisharXiv – CS AI · May 117/10
🧠

WiCER: Wiki-memory Compile, Evaluate, Refine Iterative Knowledge Compilation for LLM Wiki Systems

Researchers introduce WiCER, an iterative algorithm that solves the "compilation gap" in LLM Wiki systems—the problem of distilling raw documents into persistent knowledge artifacts without losing critical facts. The method recovers 80% of lost quality and reduces catastrophic failures by 55%, outperforming naive compilation approaches while maintaining sub-second latency advantages over traditional RAG systems.

AINeutralarXiv – CS AI · May 97/10
🧠

SkillRet: A Large-Scale Benchmark for Skill Retrieval in LLM Agents

Researchers introduce SkillRet, a large-scale benchmark dataset containing 17,810 public agent skills designed to evaluate how language model agents retrieve appropriate tools from massive skill libraries. The benchmark demonstrates that current retrieval methods struggle significantly with realistic large-scale deployments, though task-specific fine-tuning improves performance by up to 16.9 points on key metrics.

AINeutralarXiv – CS AI · May 47/10
🧠

LLM-Oriented Information Retrieval: A Denoising-First Perspective

Researchers propose that information retrieval for LLMs requires a fundamental shift toward denoising—prioritizing signal quality over quantity—because unlike humans, language models are vulnerable to hallucinations when processing noisy or irrelevant data within limited context windows. The paper introduces a four-stage framework addressing IR challenges from inaccessibility to unverifiability, with practical applications across RAG systems, coding agents, and multimodal understanding.

AIBearisharXiv – CS AI · May 17/10
🧠

How Generative AI Disrupts Search: An Empirical Study of Google Search, Gemini, and AI Overviews

A comprehensive empirical study reveals that generative AI is fundamentally reshaping web search by retrieving different sources and presenting information differently than traditional search engines. The research finds that AI Overviews appear in over half of queries, tend to prioritize Google-owned content over institutional sources, and show lower consistency and robustness compared to standard search results.

🧠 Gemini
AIBullisharXiv – CS AI · Apr 147/10
🧠

Retrieval as Generation: A Unified Framework with Self-Triggered Information Planning

Researchers introduce GRIP, a unified framework that integrates retrieval decisions directly into language model generation through control tokens, eliminating the need for external retrieval controllers. The system enables models to autonomously decide when to retrieve information, reformulate queries, and terminate retrieval within a single autoregressive process, achieving competitive performance with GPT-4o while using substantially fewer parameters.

🧠 GPT-4
AIBullisharXiv – CS AI · Apr 137/10
🧠

EigentSearch-Q+: Enhancing Deep Research Agents with Structured Reasoning Tools

Researchers introduce Q+, a structured reasoning toolkit that enhances AI research agents by making web search more deliberate and organized. Integrated into Eigent's browser agent, Q+ demonstrates consistent benchmark improvements of 0.6 to 3.8 percentage points across multiple deep-research tasks, suggesting meaningful progress in autonomous AI agent reliability.

🏢 Anthropic🧠 GPT-4🧠 GPT-5
AIBullisharXiv – CS AI · Mar 167/10
🧠

Towards AI Search Paradigm

Researchers introduce the AI Search Paradigm, a comprehensive framework for next-generation search systems using four LLM-powered agents (Master, Planner, Executor, Writer) that collaborate to handle everything from simple queries to complex reasoning tasks. The system employs modular architecture with dynamic workflows for task planning, tool integration, and content synthesis to create more adaptive and scalable AI search capabilities.

AIBullisharXiv – CS AI · Mar 47/104
🧠

Retrieval-Augmented Robots via Retrieve-Reason-Act

Researchers introduce Retrieval-Augmented Robotics (RAR), a new paradigm enabling robots to actively retrieve and use external visual documentation to execute complex tasks. The system uses a Retrieve-Reason-Act loop where robots search unstructured visual manuals, align 2D diagrams with 3D objects, and synthesize executable plans for assembly tasks.

AIBullisharXiv – CS AI · 2d ago6/10
🧠

CoHyDE: Iterative Co-Training of LLM Rewriter & Dense Encoder for Tool Retrieval

Researchers introduce CoHyDE, an iterative co-training method that jointly optimizes a dense encoder and LLM rewriter to improve tool retrieval for AI agents. The approach outperforms single-component baselines by 2.5-8 percentage points on standard and vague queries, addressing the fundamental challenge of bridging colloquial user language with technical API vocabularies.

AINeutralarXiv – CS AI · 2d ago6/10
🧠

CORE-T: COherent REtrieval of Tables for Text-to-SQL

CORE-T introduces a training-free framework for improving table retrieval in text-to-SQL systems by combining dense retrieval with LLM-generated metadata and compatibility caching. The approach achieves significant performance gains—up to 22.7 points in table-selection F1 and 24.4 points in multi-table execution accuracy—while reducing inference tokens by 64-76% compared to LLM-intensive alternatives.

AINeutralarXiv – CS AI · 2d ago6/10
🧠

HiKEY: Hierarchical Multimodal Retrieval for Open-Domain Document Question Answering

Researchers introduce HiKEY, a hierarchical multimodal retrieval framework designed to improve document-based question answering systems by leveraging document structure as a core retrieval signal. The system addresses critical limitations in existing approaches by implementing a coarse-to-fine retrieval strategy and demonstrating significant performance improvements on ODQA benchmarks.

AINeutralarXiv – CS AI · 2d ago6/10
🧠

GrepSeek: Training Search Agents for Direct Corpus Interaction

Researchers introduce GrepSeek, an AI search agent that interacts directly with text corpora using shell commands rather than traditional retrieval indexes. The system combines supervised learning with reinforcement optimization to achieve state-of-the-art results on question-answering benchmarks while operating at scale through parallel execution techniques.

AINeutralarXiv – CS AI · 2d ago6/10
🧠

Xetrieval: Mechanistically Explaining Dense Retrieval

Researchers introduce Xetrieval, a mechanistic framework that explains how dense retrieval models assign relevance scores by decomposing high-dimensional embeddings into interpretable features. The method uses a lightweight reasoning internalizer to enrich embeddings with reasoning information and provides human-readable feature-level explanations of retrieval decisions, advancing transparency in neural information retrieval systems.

AINeutralarXiv – CS AI · 2d ago6/10
🧠

Rethinking Literature Search Evaluation: Deep Research Helps, and Human Citation Lists Are Not a Ground Truth

Researchers demonstrate that deep literature search pipelines dramatically improve retrieval performance (from ~20% to 80% recall) compared to basic API searches, while simultaneously revealing that human citation lists contain significant bias and are unsuitable as ground truth for evaluation. The study advocates for multi-dimensional evaluation metrics beyond simple recall to assess citation quality accurately.

AINeutralarXiv – CS AI · 2d ago6/10
🧠

Latent Terms: Dense Retrievers Contain Trivially Extractable BM25-ready Zipfian Vocabularies

Researchers demonstrate that dense neural retrievers contain extractable sparse features matching BM25-ready vocabularies without specialized training. Sparse Autoencoders can decompose frozen dense retrievers into classical sparse retrieval components, achieving competitive or superior performance to single-vector methods while requiring no retrieval-specific supervision.

AIBullisharXiv – CS AI · 3d ago6/10
🧠

HGMEM: Hypergraph-based Working Memory to Improve Multi-step RAG for Long-Context Complex Relational Modeling

Researchers introduce HGMem, a hypergraph-based working memory system that enhances multi-step retrieval-augmented generation (RAG) for large language models by modeling complex relational dependencies among facts. Unlike traditional RAG systems that treat memory as passive storage, HGMem dynamically structures information as interconnected high-order relationships, demonstrating improved performance on global sense-making benchmarks requiring complex reasoning across extended contexts.

AINeutralarXiv – CS AI · 3d ago6/10
🧠

Dr-CiK: A Testbed for Foresight-Driven Agents

Researchers introduce Dr-CiK, a benchmark for testing whether AI agents can independently retrieve relevant context from noisy document sources to improve time series forecasting. Evaluation reveals current information retrieval agents recover less than 5% of supporting evidence and are frequently misled by irrelevant information, highlighting a critical gap in foresight-driven AI development.

AINeutralarXiv – CS AI · 3d ago6/10
🧠

A Systematic Evaluation of Retrieval-Augmented Generation and Language Models for Space Operations

Researchers systematically evaluate Retrieval-Augmented Generation (RAG) pipelines that combine Large Language Models with information retrieval techniques for space operations. The study demonstrates that RAG systems can effectively process vast technical documentation and operational guidelines, enhancing decision-making accuracy and reliability in complex space environments.

AINeutralarXiv – CS AI · 3d ago6/10
🧠

Checking Fact with Better Retrieval: Dynamic Contrastive Learning for Evidence Retrieval

Researchers propose DACLR, a dynamic contrastive learning method that improves evidence retrieval for multimodal fact-checking by converting diverse media types to text and extracting event-level features. The approach uses a two-stage recall-rerank system with adaptive loss functions to better match claims with relevant evidence rather than merely semantically similar content.

AINeutralarXiv – CS AI · 3d ago5/10
🧠

Eliot: Interactively $\underline{E}$xploring Fast-Changing Scientific $\underline{Li}$terature Trends with $\underline{O}$nline Da$\underline{t}$a and Learning

Researchers present Eliot, an interactive system for exploring evolving scientific literature trends across rapidly changing fields like Large Language Models and Automated Planning. The tool retrieves arXiv papers at query time, clusters them into thematic groups, and visualizes publication patterns over time, with evaluations showing 85% accuracy in meaningful cluster labeling across eight research domains.

Page 1 of 3Next →