AINeutralarXiv โ CS AI ยท 14h ago6/10
๐ง
TimeSeriesExamAgent: Creating Time Series Reasoning Benchmarks at Scale
Researchers introduce TimeSeriesExamAgent, a scalable framework for automatically generating time series reasoning benchmarks using LLM agents and templates. The study reveals that while large language models show promise in time series tasks, they significantly underperform in abstract reasoning and domain-specific applications across healthcare, finance, and weather domains.