AIBullisharXiv – CS AI · May 97/10
🧠Researchers provide theoretical proof that sign-based optimization algorithms like SignSGD outperform standard SGD under specific conditions involving ℓ1-norm stationarity and sparse noise, with complexity improvements scaling by problem dimension d. The analysis bridges theory and practice by demonstrating these advantages during GPT-2 pretraining, explaining why sign-based methods succeed in large language model training despite lacking previous theoretical justification.
AIBullisharXiv – CS AI · Mar 37/104
🧠Researchers introduce the first theoretical framework analyzing convergence of adaptive optimizers like Adam and Muon under floating-point quantization in low-precision training. The study shows these algorithms maintain near full-precision performance when mantissa length scales logarithmically with iterations, with Muon proving more robust than Adam to quantization errors.
AINeutralarXiv – CS AI · 2d ago6/10
🧠Researchers reveal that Sharpness-Aware Minimization (SAM), a popular deep learning training method, has convergence instability near saddle points and may actually escape saddle points more poorly than standard gradient descent. The study demonstrates that momentum and batch-size adjustments are critical for mitigating these instabilities and achieving strong generalization performance.
AINeutralarXiv – CS AI · 2d ago5/10
🧠A mathematical research paper proposes that deep learning models can be understood through tame geometry (o-minimality), a mathematical framework that enables convergence guarantees for stochastic gradient descent in nonsmooth, nonconvex settings. This perspective offers a formal mathematical foundation for analyzing AI system behavior and training stability.
AINeutralarXiv – CS AI · 6d ago5/10
🧠Researchers propose STHTD-MP, a new machine learning algorithm that improves off-policy prediction by using behavior-policy information to optimize the geometry of gradient temporal-difference methods. The method demonstrates faster convergence than existing approaches like GTD2-MP under certain conditions, with theoretical guarantees and empirical validation on standard benchmarks.
AINeutralarXiv – CS AI · 6d ago6/10
🧠Researchers introduce Singularity-aware Adam (S-Adam), a novel optimizer addressing instability in deep learning with non-smooth components like ReLU activations. The method uses a Local Geometric Instability metric to dynamically adjust step sizes, demonstrating up to 6% accuracy improvements on benchmark datasets while mitigating gradient oscillations.
AINeutralarXiv – CS AI · May 286/10
🧠Researchers identify a critical failure mode in test-time reinforcement learning (TTRL) where majority voting locks onto incorrect answers, permanently suppressing correct signals in low-ability problems. They introduce TTRL-Guard, a framework using flip-rate monitoring and selective updating to prevent this 'Correct-Answer Extinction Window,' achieving 54% relative improvement on AIME 2025 benchmarks.
AINeutralarXiv – CS AI · May 276/10
🧠Researchers propose PANDA, a novel bilevel optimization algorithm for reinforcement learning that handles competitive multi-agent scenarios modeled as zero-sum Markov games. The method achieves state-of-the-art convergence rates without requiring second-order derivatives, advancing RL applications in incentive design and competitive environments.
AINeutralarXiv – CS AI · May 276/10
🧠Researchers analyze deep unfolding neural networks derived from forward-backward-splitting algorithms, establishing convergence guarantees for training problems toward deep-layer limit systems. The work provides theoretical foundations for understanding how neural networks unrolled from optimization algorithms learn, with implications for designing more stable and interpretable deep learning architectures.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers discover that neural networks across different modalities (vision, point clouds, language) converge toward shared representations, with non-language modalities systematically moving toward language's neighborhood structure rather than vice versa. Using directional analysis, they attribute this asymmetry to language representations occupying more compact feature space, proposing that language serves as the asymptotic attractor in multimodal representation learning.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers present a formal framework for recursive reasoning systems that addresses two critical design challenges: how to represent evolving reasoning states and when to terminate iteration. The paper introduces an epistemic state graph representation and proposes the 'order-gap' metric as a stopping criterion, with theoretical guarantees for when this criterion provides meaningful guidance.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers propose a decentralized gradient descent framework for optimizing time-varying objectives across distributed networks processing streaming data. The work analyzes tracking error using temporal weighting strategies, showing uniform weighting achieves O(1/t) convergence while exponential discounting maintains non-vanishing error floors, with implications for distributed machine learning systems.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers propose REED (Resource-Element Energy Difference), a noncoherent aggregation method for over-the-air federated learning that eliminates the need for instantaneous channel state information. The technique uses energy differences across orthogonal resource elements to aggregate signed updates, achieving convergence rates comparable to conventional methods while reducing practical implementation complexity in wireless systems.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers introduce TAP (Two-Stage Adaptive Personalization), a novel federated learning framework that enables personalized fine-tuning of foundation models across clients with heterogeneous tasks and modalities. The method uses mismatched architectures to prevent cross-task interference and post-FL distillation to recover shared knowledge, advancing practical deployment of AI systems in distributed environments.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers propose R-GTD, a regularized gradient temporal-difference learning algorithm that maintains convergence guarantees even when the feature interaction matrix becomes singular—a practical limitation in existing GTD methods. The geometric analysis provides explicit error bounds and addresses a key stability challenge in off-policy reinforcement learning with function approximation.
AINeutralarXiv – CS AI · May 96/10
🧠Researchers provide theoretical foundations for Reinforcement Learning with Verifiable Rewards (RLVR), a technique for post-training large language models using binary feedback. The analysis introduces the 'Gradient Gap' concept to explain convergence dynamics and derives critical step-size thresholds that determine whether training succeeds or fails, with implications for practical implementations like length normalization.
AIBullisharXiv – CS AI · Apr 146/10
🧠Researchers propose a novel hybrid fine-tuning method for Large Language Models that combines full parameter updates with Parameter-Efficient Fine-Tuning (PEFT) modules using zeroth-order and first-order optimization. The approach addresses computational constraints of full fine-tuning while overcoming PEFT's limitations in knowledge acquisition, backed by theoretical convergence analysis and empirical validation across multiple tasks.
AINeutralarXiv – CS AI · Mar 45/103
🧠Research paper establishes the first theoretical separation between Adam and SGD optimization algorithms, proving Adam achieves better high-probability convergence guarantees. The study provides mathematical backing for Adam's superior empirical performance through second-moment normalization analysis.