y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#pretraining-efficiency News & Analysis

2 articles tagged with #pretraining-efficiency. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles
AIBullisharXiv – CS AI · Jun 197/10
🧠

Rethinking Shrinkage Bias in LLM FP4 Pretraining: Geometric Origin, Systemic Impact, and UFP4 Recipe

Researchers identify a fundamental flaw in current FP4 training approaches for large language models: E2M1 formats suffer from systematic "Shrinkage Bias" that degrades training stability. They propose UFP4, a uniform 4-bit recipe using E1M2/INT4 grids that outperforms existing E2M1 baselines across multiple model scales, suggesting future AI accelerators should prioritize uniform grid formats for training.

🏢 Nvidia
AIBullisharXiv – CS AI · May 117/10
🧠

When Losses Align: Gradient-Based Composite Loss Weighting for Efficient Pretraining

Researchers propose a gradient-based bilevel optimization method that automatically learns composite loss weights during pretraining by aligning gradients with downstream objectives. The approach reduces hyperparameter tuning overhead to ~30% above baseline training cost while matching or exceeding manually tuned baselines across event-sequence and computer vision tasks.