AIBearisharXiv – CS AI · 3h ago7/10
🧠
LiveBrowseComp: Are Search Agents Searching, or Just Verifying What They Already Know?
Researchers reveal that LLM-based search agents often rely on intrinsic knowledge rather than genuinely searching the web, with up to 44.5% of answers generated without tool use. The new LiveBrowseComp benchmark, designed to test agents on recent facts within 90 days, shows all evaluated agents drop below 2% accuracy and exposes fundamental limitations in current search-augmented AI evaluation.
🏢 Hugging Face