AINeutralarXiv โ CS AI ยท 5d ago7/103
๐ง
What Scales in Cross-Entropy Scaling Law?
Researchers discovered that the traditional cross-entropy scaling law for large language models breaks down at very large scales because only one component (error-entropy) actually follows power-law scaling, while other components remain constant. This finding explains why model performance improvements become less predictable as models grow larger and establishes a new error-entropy scaling law for better understanding LLM development.