AdaMeZO: Adam-style Zeroth-Order Optimizer for LLM Fine-tuning Without Maintaining the Moments
Researchers introduce AdaMeZO, a new zeroth-order optimizer that combines the memory efficiency of MeZO with Adam-style moment estimation for fine-tuning large language models. The method achieves faster convergence than MeZO while reducing GPU memory requirements and requiring up to 70% fewer forward passes.