Dr-CiK: A Testbed for Foresight-Driven Agents
Researchers introduce Dr-CiK, a benchmark for testing whether AI agents can independently retrieve relevant context from noisy document sources to improve time series forecasting. Evaluation reveals current information retrieval agents recover less than 5% of supporting evidence and are frequently misled by irrelevant information, highlighting a critical gap in foresight-driven AI development.
Dr-CiK addresses a fundamental limitation in current AI forecasting systems: the assumption that relevant context is already available. In real-world applications, forecasters must actively search through heterogeneous data sources to discover supporting evidence, a capability that existing benchmarks fail to evaluate. This research exposes significant weaknesses in state-of-the-art document retrieval agents paired with forecasting models.
The benchmark's findings are striking. Most document retrieval agents recover less than 5% of ground-truth supporting evidence and cite distractors in over 80% of cases, actively degrading forecast accuracy rather than improving it. This suggests current information retrieval pipelines lack the sophistication needed for complex reasoning tasks where distinguishing signal from noise is critical. The problem compounds when agents retrieve irrelevant information that misleads downstream forecasting models.
These results have implications for AI systems deployed in financial forecasting, supply chain prediction, and other domains requiring accurate foresight. Organizations relying on retrieval-augmented generation systems should recognize current limitations when these systems operate on noisy, real-world data. The research motivates development of foresight-driven agents that understand contextual relevance to forecasting objectives, not just relevance to query terms.
Future research should focus on improving agent reasoning about what context actually matters for specific prediction tasks. This requires advances in causal inference within information retrieval, better distractor filtering mechanisms, and evaluation frameworks that reward end-to-end forecasting performance rather than retrieval metrics alone. The Dr-CiK benchmark provides a foundation for measuring progress in this critical direction.
- βCurrent document retrieval agents recover less than 5% of relevant supporting context for forecasting tasks
- βOver 80% of retrieved citations are distractors that actively harm forecast accuracy
- βExisting context-aided forecasting benchmarks fail to test agents' ability to independently discover relevant information
- βHigh-quality context substantially improves forecasting performance when properly retrieved and filtered
- βDevelopment of foresight-driven agents requires advances in causal reasoning and distractor filtering