🧠 AI🟢 BullishImportance 6/10

Automating Forecasting Question Generation and Resolution for AI Evaluation

arXiv – CS AI|Nikos I. Bosse, Peter M\"uhlbacher, Jack Wildman, Lawrence Phillips, Dan Schwarz|March 11, 2026 at 04:00 AM

🤖AI Summary

Researchers developed an automated system using LLM-powered web research agents to generate and resolve forecasting questions at scale, creating 1,499 diverse real-world questions with 96% quality rate. The system demonstrates that more advanced AI models perform significantly better at forecasting tasks, with potential applications for improving AI evaluation benchmarks.

Key Takeaways

→New automated system generates high-quality forecasting questions at 96% accuracy, exceeding human-curated platforms like Metaculus.
→System successfully resolved forecasting questions with 95% accuracy several months after generation.
→More advanced AI models showed measurably better forecasting performance with lower Brier scores.
→Question decomposition strategies can significantly improve AI forecasting accuracy when applied systematically.
→The approach enables scalable evaluation of AI forecasting capabilities beyond limited recurring data sources.

Mentioned in AI

Models

GPT-5OpenAI

GeminiGoogle