AIBullisharXiv โ CS AI ยท 8h ago6/10
๐ง
PolyGLU: State-Conditional Activation Routing in Transformer Feed-Forward Networks
Researchers introduce PolyGLU, a new transformer architecture that enables dynamic routing among multiple activation functions, mimicking biological neural diversity. The 597M-parameter PolychromaticLM model shows emergent specialization patterns and achieves strong performance despite training on significantly fewer tokens than comparable models.
๐ข Nvidia