AINeutralarXiv – CS AI · 9h ago6/10
🧠
PC Layer: Polynomial Weight Preconditioning for Improving LLM Pre-Training
Researchers propose a PC (Preconditioning) layer that uses polynomial weight parameterization to stabilize training of large language models while maintaining computational efficiency. The approach demonstrates performance improvements over standard transformers during Llama-1B pre-training and includes theoretical guarantees for convergence in certain network architectures.
🧠 Llama