y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#mixture-of-depths News & Analysis

1 article tagged with #mixture-of-depths. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

1 articles
AIBullisharXiv – CS AI · Mar 177/10
🧠

Mixture-of-Depths Attention

Researchers introduce Mixture-of-Depths Attention (MoDA), a new mechanism for large language models that allows attention heads to access key-value pairs from both current and preceding layers to combat signal degradation in deeper models. Testing on 1.5B-parameter models shows MoDA improves perplexity by 0.2 and downstream task performance by 2.11% with only 3.7% computational overhead while maintaining 97.3% of FlashAttention-2's efficiency.

🏢 Perplexity