AIBearisharXiv – CS AI · Apr 107/10
🧠
Daily and Weekly Periodicity in Large Language Model Performance and Its Implications for Research
Researchers discovered that GPT-4o exhibits significant daily and weekly performance fluctuations when solving identical tasks under fixed conditions, with periodic variability accounting for approximately 20% of total variance. This finding fundamentally challenges the widespread assumption that LLM performance is time-invariant and raises critical concerns about the reliability and reproducibility of research utilizing large language models.
🧠 GPT-4