🧠 AI🔴 BearishImportance 7/10

LiveBrowseComp: Are Search Agents Searching, or Just Verifying What They Already Know?

arXiv – CS AI|HuiMing Fan, Xiao Wang, Zheng Chu, Qianyu Wang, Zhuoyao Wang, Ming Liu, Bing Qin, XingYu|May 28, 2026 at 04:00 AM

🤖AI Summary

Researchers reveal that LLM-based search agents often rely on intrinsic knowledge rather than genuinely searching the web, with up to 44.5% of answers generated without tool use. The new LiveBrowseComp benchmark, designed to test agents on recent facts within 90 days, shows all evaluated agents drop below 2% accuracy and exposes fundamental limitations in current search-augmented AI evaluation.

Analysis

This research exposes a critical gap between perception and reality in AI agent capabilities. While search-augmented language models are marketed as tools for real-time information retrieval, the study demonstrates they primarily use the web as verification for pre-trained knowledge rather than genuine discovery engines. The finding that agents answer nearly half of questions without attempting retrieval, and that removing answer-supporting evidence causes performance to plummet below zero-shot baselines, reveals the brittleness underlying these systems. Static benchmarks like BrowseComp inadvertently reward memory-backed verification, creating inflated performance metrics that misrepresent actual search capabilities.

The introduction of LiveBrowseComp addresses a methodological blind spot in AI evaluation. By focusing exclusively on facts published within 90 days and filtering out globally salient events, the benchmark prevents models from relying on training data cutoff knowledge. The dramatic performance collapse across all agents—from respectable scores on BrowseComp to sub-2% accuracy—suggests current search agents lack genuine information-seeking behavior. This has profound implications for deployment scenarios requiring actual discovery of novel information.

For developers and organizations building search-augmented systems, this work signals that existing evaluation frameworks provide false confidence. The finding that prior model rankings cease to predict LiveBrowseComp performance indicates fundamental architectural changes may be necessary. Real-world applications in research, journalism, and financial analysis require agents that actively discover rather than verify, making this limitation critical for practical deployment. The research suggests the field must move beyond treating search as a secondary augmentation and instead redesign agents to prioritize evidence-driven reasoning over intrinsic knowledge utilization.

Key Takeaways

→LLM search agents answer up to 44.5% of questions without using tools, relying instead on pre-trained knowledge
→Agents generate over half their search queries from internal hypotheses rather than retrieved information
→LiveBrowseComp benchmark shows all evaluated agents achieving below 2% accuracy on recent-fact questions
→Static benchmarks conflate memory-backed verification with genuine search capability, masking actual limitations
→Current model rankings on traditional benchmarks fail to predict performance on genuine discovery tasks

Mentioned in AI

Companies

Hugging Face→

#llm-agents #search-capability #benchmark-evaluation #ai-limitations #intrinsic-knowledge #information-retrieval #model-evaluation #agent-behavior

Read Original →via arXiv – CS AI

Act on this with AI

Stay ahead of the market.

Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.

Connect Wallet to AI →How it works

AIMay 6

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

AIMay 6

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

AIMay 6

LiveBrowseComp: Are Search Agents Searching, or Just Verifying What They Already Know?

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

S&P 500 and NASDAQ hit record highs as AI chip stocks surge