🧠 AI⚪ NeutralImportance 7/10

What Scales in Cross-Entropy Scaling Law?

arXiv – CS AI|Junxi Yan, Zixi Wei, Qingyao Ai, Yiqun Liu, Jingtao Zhan|March 3, 2026 at 05:00 AM|3 views

🤖AI Summary

Researchers discovered that the traditional cross-entropy scaling law for large language models breaks down at very large scales because only one component (error-entropy) actually follows power-law scaling, while other components remain constant. This finding explains why model performance improvements become less predictable as models grow larger and establishes a new error-entropy scaling law for better understanding LLM development.

Key Takeaways

→Cross-entropy scaling law fails at very large model scales, causing unpredictable performance improvements.
→Cross-entropy can be decomposed into three components: Error-Entropy, Self-Alignment, and Confidence.
→Only error-entropy follows robust power-law scaling while other components remain largely invariant.
→Error-entropy dominates in small models but diminishes proportionally as models grow larger.
→The new error-entropy scaling law provides more accurate predictions for large language model behavior.