y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#transformer-scaling News & Analysis

2 articles tagged with #transformer-scaling. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles
AINeutralarXiv – CS AI · 10h ago6/10
🧠

Hierarchical Mixture-of-Experts with Two-Stage Optimization

Researchers introduce Hi-MoE, a hierarchical Mixture-of-Experts framework that addresses a fundamental routing trade-off in sparse MoE models by implementing two-stage optimization: inter-group load balancing and intra-group expert specialization. Tested on large-scale NLP and vision tasks, Hi-MoE achieves 5.6% perplexity improvements and superior expert balance compared to existing methods.

🏢 Meta🏢 Perplexity
AINeutralarXiv – CS AI · 10h ago6/10
🧠

Probing the Impact of Scale on Data-Efficient, Generalist Transformer World Models for Atari

Researchers demonstrate that transformer-based world models exhibit distinct scaling behaviors across Atari environments, with joint multi-task training stabilizing performance gains. The study reveals that individual environments respond differently to model scaling, but unified training across 26 Atari games ensures consistent improvements regardless of inherent task complexity.