AdaMeZO: Adam-style Zeroth-Order Optimizer for LLM Fine-tuning Without Maintaining the Moments
Researchers introduce AdaMeZO, a new zeroth-order optimizer that combines the memory efficiency of MeZO with Adam-style moment estimation for fine-tuning large language models. The method achieves faster convergence than MeZO while reducing GPU memory requirements and requiring up to 70% fewer forward passes.
AdaMeZO addresses a critical bottleneck in modern AI development: the prohibitive computational costs of fine-tuning large language models. Traditional backpropagation requires extensive GPU memory to store gradients and model states, while MeZO offered a memory-efficient alternative by using only forward passes. However, MeZO's approach sacrificed convergence speed by ignoring loss landscape characteristics. The new optimizer bridges this gap by implementing Adam-style moment estimates—techniques that identify and prioritize optimization along dimensions with steeper gradients—without storing moments in memory, effectively neutralizing the memory penalty that would otherwise triple requirements. This innovation matters because GPU scarcity remains a significant barrier to LLM fine-tuning accessibility, particularly for researchers and organizations without enterprise-scale infrastructure. The 70% reduction in forward passes translates to tangible cost and time savings, making advanced model customization more practical for diverse use cases. From a developer perspective, AdaMeZO democratizes LLM fine-tuning by reducing the hardware specifications needed for competitive performance. Organizations can now achieve comparable results on consumer-grade or mid-range GPUs that previously would have required high-end infrastructure. The research validates this through extensive experiments and trajectory visualizations showing AdaMeZO's adaptive capacity across different loss landscapes. As LLM deployment becomes increasingly specialized and domain-specific, optimization techniques that reduce computational friction directly impact the pace of innovation. The work exemplifies how algorithmic refinement can expand the pool of practitioners capable of advancing AI systems.
- →AdaMeZO enables Adam-style optimization for memory-constrained LLM fine-tuning without storing moment estimates
- →The method achieves up to 70% fewer forward passes compared to baseline MeZO while maintaining memory efficiency
- →GPU memory requirements remain significantly lower than standard backpropagation-based methods like full Adam
- →Theoretical analysis and empirical validation confirm AdaMeZO's ability to navigate diverse loss landscapes effectively
- →The advancement lowers barriers to LLM fine-tuning accessibility for resource-constrained researchers and organizations