🧠 AI⚪ NeutralImportance 6/10

FinRetrieval: A Benchmark for Financial Data Retrieval by AI Agents

arXiv – CS AI|Eric Y. Kim, Jie Huang|March 6, 2026 at 05:00 AM

🤖AI Summary

Researchers introduced FinRetrieval, a benchmark testing AI agents' ability to retrieve financial data, evaluating 14 configurations across major providers. The study found that tool availability dramatically impacts performance, with Claude Opus achieving 90.8% accuracy using structured APIs versus only 19.8% with web search alone.

Key Takeaways

→FinRetrieval benchmark includes 500 financial retrieval questions with ground truth answers across 14 AI agent configurations.
→Tool availability is the dominant performance factor, with Claude Opus showing a 71 percentage point gap between API access and web search only.
→Reasoning mode benefits vary inversely with base capability, with OpenAI gaining 9.0pp versus Claude's 2.8pp improvement.
→Geographic performance gaps of 5.6pp favoring US data stem from fiscal year naming conventions rather than model limitations.
→The complete dataset, evaluation code, and tool traces are publicly released for further financial AI research.

Mentioned in AI

Companies

OpenAI→

Anthropic→

Models

ClaudeAnthropic