AIBullisharXiv – CS AI · 8h ago6/10
🧠
Unlocking Feature Learning in Gated Delta Networks at Scale
Researchers have developed scaling rules for Gated Delta Networks (GDNs) by extending the Maximal Update Parametrization (μP) framework, enabling stable hyperparameter transfer across model sizes. This advancement addresses a critical bottleneck in training efficient sub-quadratic language models, allowing learning rates to transfer zero-shot between different model widths without retuning.