PC Layer: Polynomial Weight Preconditioning for Improving LLM Pre-Training
Researchers propose a PC (Preconditioning) layer that uses polynomial weight parameterization to stabilize training of large language models while maintaining computational efficiency. The approach demonstrates performance improvements over standard transformers during Llama-1B pre-training and includes theoretical guarantees for convergence in certain network architectures.