🧠 AI⚪ NeutralImportance 6/10

TimeSeek: Temporal Reliability of Agentic Forecasters

arXiv – CS AI|Hamza Mostafa, Om Shastri, Dennis Lee|April 7, 2026 at 04:00 AM

🤖AI Summary

TimeSeek introduces a benchmark showing that AI language models perform best at predicting binary market outcomes early in a market's lifecycle and on high-uncertainty markets, but struggle near resolution and on consensus markets. Web search generally improves forecasting accuracy across models, though not uniformly, while simple ensembles reduce errors without beating market performance overall.

Key Takeaways

→AI forecasting models are most competitive early in prediction markets and on high-uncertainty outcomes.
→Model performance degrades significantly near market resolution and on strong-consensus markets.
→Web search improves overall forecasting accuracy but hurts performance in 12% of model-checkpoint pairs.
→Simple two-model ensembles reduce forecasting errors without surpassing market performance.
→The research suggests time-aware evaluation and selective-deference policies are more effective than uniform approaches.