AINeutralarXiv โ CS AI ยท 5h ago6/10
๐ง
TimeSeek: Temporal Reliability of Agentic Forecasters
TimeSeek introduces a benchmark showing that AI language models perform best at predicting binary market outcomes early in a market's lifecycle and on high-uncertainty markets, but struggle near resolution and on consensus markets. Web search generally improves forecasting accuracy across models, though not uniformly, while simple ensembles reduce errors without beating market performance overall.