AINeutralarXiv – CS AI · 3h ago6/10
🧠
VeriTrip: A Verifiable Benchmark for Travel Planning Agents over Unstructured Web Corpora
Researchers introduce VeriTrip, a new benchmark for evaluating travel planning AI agents on their ability to reason over unstructured web data rather than structured APIs. The benchmark addresses critical gaps in agent evaluation by testing performance against information noise, contradictory facts, and multimodal content, revealing a significant trade-off between autonomous information retrieval and instruction following.