AINeutralarXiv – CS AI · 7h ago6/10
🧠
LocalSearchBench: Benchmarking Agentic Search in Real-World Local Life Services
Researchers introduced LocalSearchBench, a comprehensive benchmark for testing AI agents in local life services, revealing significant performance gaps even among state-of-the-art large reasoning models. The benchmark comprises 1.3M merchant entries and 900 multi-hop reasoning tasks, exposing critical weaknesses in completeness and faithfulness that underscore the need for domain-specific AI agent development.