🧠 AI⚪ NeutralImportance 6/10

Leakage-Aware Benchmarking of LLM Forecasting: Real-Time Nowcasts as the Decision-Time Input for Macro Factor Ranking

arXiv – CS AI|Mao Guan, Qian Chen|June 23, 2026 at 04:00 AM

🤖AI Summary

Researchers benchmark a retrieval-augmented LLM system for equity factor ranking using strictly decision-time information, avoiding data leakage common in forecasting benchmarks. The 7B model achieves modest positive results (median IC +0.154) comparable to simpler kNN baselines, suggesting real-time macro data and historical analogies drive most signal while LLMs may add marginal value in extreme rankings.

Analysis

This research addresses a critical methodological flaw in LLM forecasting benchmarks: the widespread practice of training or evaluating models on features that wouldn't be available at actual decision time, which artificially inflates apparent performance. The authors construct a rigorous experimental framework spanning three years (April 2023 to March 2026) where their LLM system observes only information truly available month-end for equity factor ranking decisions. The pipeline combines macro-analog retrieval with critic and actor LLMs to score seven U.S. equity style factors, achieving a median rank correlation of +0.154 with consistency across multiple 12-month windows. However, statistical significance remains elusive given the confidence interval includes zero.

The study reveals that much of the LLM system's predictive power derives not from sophisticated language understanding but from simpler components: lagged macro variables, recent event summaries, and Cleveland Fed inflation nowcasts. A non-LLM kNN baseline matching the decision-time constraint recovers comparable median performance, suggesting that macro-similar historical state selection explains the bulk of signal. Where LLMs show potential advantage is in extreme rank scores that matter for long-short portfolio construction, indicating language models may extract nuance from edge cases rather than bulk predictions.

For AI practitioners and quant investors, this study validates caution around benchmark inflation while demonstrating that LLM value in forecasting remains context-dependent and marginal. The work establishes a gold-standard experimental methodology for future LLM forecasting research. Practitioners should skeptically evaluate forecasting claims lacking explicit decision-time constraints and recognize that simpler baselines often capture the majority of achievable signal in macro prediction tasks.

Key Takeaways

→LLM forecasting benchmarks commonly suffer from data leakage; this study enforces strict decision-time information constraints to measure true capability.
→A 7B LLM system achieved +0.154 median monthly Spearman IC on equity factor ranking, statistically underpowered but consistent across subperiods.
→Non-LLM kNN baseline recovered comparable median performance, suggesting real-time macro data and historical similarity drive most predictive signal.
→LLMs showed marginal advantage concentrated in extreme rankings used for long-short portfolio formation rather than bulk predictions.
→The research establishes methodological best practices for leakage-aware LLM forecasting that should inform future benchmarking in financial prediction.

#llm-forecasting #data-leakage #equity-factors #macro-prediction #benchmark-methodology #retrieval-augmented #quantitative-finance #model-evaluation

Read Original →via arXiv – CS AI

Act on this with AI

Stay ahead of the market.

Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.

Connect Wallet to AI →How it works

AIMay 6

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

AIMay 6

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

AIMay 6

Leakage-Aware Benchmarking of LLM Forecasting: Real-Time Nowcasts as the Decision-Time Input for Macro Factor Ranking

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

S&P 500 and NASDAQ hit record highs as AI chip stocks surge