y0news
← Feed
Back to feed
🧠 AI NeutralImportance 6/10

FinRetrieval: A Benchmark for Financial Data Retrieval by AI Agents

arXiv – CS AI|Eric Y. Kim, Jie Huang|
🤖AI Summary

Researchers introduced FinRetrieval, a benchmark testing AI agents' ability to retrieve financial data, evaluating 14 configurations across major providers. The study found that tool availability dramatically impacts performance, with Claude Opus achieving 90.8% accuracy using structured APIs versus only 19.8% with web search alone.

Key Takeaways
  • FinRetrieval benchmark includes 500 financial retrieval questions with ground truth answers across 14 AI agent configurations.
  • Tool availability is the dominant performance factor, with Claude Opus showing a 71 percentage point gap between API access and web search only.
  • Reasoning mode benefits vary inversely with base capability, with OpenAI gaining 9.0pp versus Claude's 2.8pp improvement.
  • Geographic performance gaps of 5.6pp favoring US data stem from fiscal year naming conventions rather than model limitations.
  • The complete dataset, evaluation code, and tool traces are publicly released for further financial AI research.
Mentioned in AI
Companies
OpenAI
Anthropic
Models
ClaudeAnthropic
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles