y0news
← Feed
Back to feed
🧠 AI NeutralImportance 6/10

Singularity-aware Optimization via Randomized Geometric Probing: Towards Stable Non-smooth Optimization

arXiv – CS AI|Ruoran Xu, Borong She, Xiaobo Jin, Qiufeng Wang|
🤖AI Summary

Researchers introduce Singularity-aware Adam (S-Adam), a novel optimizer addressing instability in deep learning with non-smooth components like ReLU activations. The method uses a Local Geometric Instability metric to dynamically adjust step sizes, demonstrating up to 6% accuracy improvements on benchmark datasets while mitigating gradient oscillations.

Analysis

Modern deep learning architectures introduce non-smooth elements that violate traditional optimization assumptions, creating challenges for adaptive optimizers like Adam. Gradient chattering—violent oscillations from conflicting signals in the Clarke subdifferential—degrades convergence and generalization performance, particularly in quantization-aware training and small-batch scenarios. S-Adam addresses this fundamental limitation by introducing a computationally efficient Local Geometric Instability (LGI) metric that quantifies subdifferential diameter through randomized directional derivatives, enabling real-time detection of unstable regions.

The optimizer's adaptive damping mechanism exponentially decelerates updates in high-instability zones while maintaining acceleration in smooth loss landscapes, balancing exploration and stability. This approach builds on differential inclusion theory, providing formal convergence guarantees to Clarke stationary points at optimal O(1/√T) rates—matching theoretical benchmarks for non-smooth optimization. The method proves particularly valuable for quantization-aware training, where discrete operations create inherent non-smoothness, and for distributed learning with small batch sizes where noisy gradients amplify instability.

Empirical results demonstrate consistent improvements across CIFAR-100 and TinyImageNet benchmarks, with 3-6% accuracy gains over existing methods. Beyond academic significance, this advancement matters for deploying quantized neural networks on edge devices and mobile platforms where computational efficiency depends on stable training. The work addresses a growing gap between theoretical optimization assumptions and practical architectural realities, relevant to practitioners implementing state-of-the-art models with activation functions and quantization operators.

Key Takeaways
  • S-Adam introduces Local Geometric Instability metric to detect and mitigate gradient chattering in non-smooth loss landscapes.
  • Achieves up to 6% accuracy improvements on CIFAR-100 and 3% on TinyImageNet compared to AdamW and Prox-SGD.
  • Provides formal convergence guarantees to Clarke stationary points at optimal O(1/√T) rates using differential inclusion theory.
  • Particularly beneficial for quantization-aware training and small-batch learning scenarios with high gradient noise.
  • Adaptive damping mechanism balances convergence speed in smooth regions with stability in geometrically unstable areas.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles