#gradient-optimization News & Analysis

22 articles tagged with #gradient-optimization. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

22 articles

AIBullisharXiv – CS AI · Jun 237/10

🧠

The Optimal Token Baseline: Variance Reduction for Long-Horizon LLM-RL

Researchers propose Optimal Token Baseline (OTB), a new variance reduction technique for reinforcement learning in large language models that addresses training instability in long-horizon tasks. The method reduces token consumption by over 65% while maintaining performance equivalent to models using 8x larger batch sizes, offering significant efficiency gains for LLM-RL training.

AIBullisharXiv – CS AI · Jun 97/10

🧠

Internalizing Geometric Law: Learning from Solver Residuals for Precision-Critical Generation

Researchers introduce PyGeoX, a geometric constraint solver and benchmark that addresses hallucination problems in large language models for precision-critical tasks like technical design. They identify a failure mode called Outlier Gradient Masking in standard reward schemes and propose Saturating Additive Rewards (SAR) to improve constraint satisfaction, achieving 2.3x performance gains on hard problems.

AIBullisharXiv – CS AI · Jun 57/10

🧠

Plug-and-Play Guidance for Discrete Diffusion Models via Gradient-Informed Logit Correction

Researchers have developed GILC, a plug-and-play framework that enables efficient controllable generation in discrete diffusion models without retraining. The method uses gradient-informed logit correction and a Jacobian-free mechanism to stabilize guidance across DNA, protein, and molecular generation tasks, achieving state-of-the-art results.

AIBullisharXiv – CS AI · May 287/10

🧠

Mitigating Staleness in Asynchronous Pipeline Parallelism via Basis Rotation

Researchers propose a basis rotation framework to address gradient staleness in asynchronous pipeline parallelism, a technique used for distributed AI training. By aligning the optimizer's coordinate system with the Hessian eigenbasis, the method reduces training iterations by 81.7% compared to existing asynchronous baselines, enabling more efficient large-scale model training.

AIBullisharXiv – CS AI · May 117/10

🧠

When Losses Align: Gradient-Based Composite Loss Weighting for Efficient Pretraining

Researchers propose a gradient-based bilevel optimization method that automatically learns composite loss weights during pretraining by aligning gradients with downstream objectives. The approach reduces hyperparameter tuning overhead to ~30% above baseline training cost while matching or exceeding manually tuned baselines across event-sequence and computer vision tasks.

AIBearisharXiv – CS AI · May 77/10

🧠

Sparse Tokens Suffice: Jailbreaking Audio Language Models via Token-Aware Gradient Optimization

Researchers demonstrate that audio language models can be jailbroken using sparse token optimization rather than dense waveform updates, with Token-Aware Gradient Optimization (TAGO) achieving comparable attack success rates while modifying only 25% of audio tokens. The findings reveal that gradient energy concentrates in specific audio regions, suggesting future AI safety research should account for this heterogeneous token-level structure.

AIBullisharXiv – CS AI · Mar 57/10

🧠

Not All Candidates are Created Equal: A Heterogeneity-Aware Approach to Pre-ranking in Recommender Systems

Researchers developed HAP (Heterogeneity-Aware Adaptive Pre-ranking), a new framework for recommender systems that addresses gradient conflicts in training by separating easy and hard samples. The system has been deployed in Toutiao's production environment for 9 months, achieving 0.4% improvement in user engagement without additional computational costs.

AIBullisharXiv – CS AI · Mar 37/104

🧠

Polynomial, trigonometric, and tropical activations

Researchers developed new activation functions for deep neural networks based on polynomial and trigonometric orthonormal bases that can successfully train models like GPT-2 and ConvNeXt. The work addresses gradient problems common with polynomial activations and shows these networks can be interpreted as multivariate polynomial mappings.

AIBullisharXiv – CS AI · Mar 37/102

🧠

GradientStabilizer:Fix the Norm, Not the Gradient

Researchers propose GradientStabilizer, a new technique to address training instability in deep learning by replacing gradient magnitude with statistically stabilized estimates while preserving direction. The method outperforms gradient clipping across multiple AI training scenarios including LLM pre-training, reinforcement learning, and computer vision tasks.

AIBullisharXiv – CS AI · Mar 37/104

🧠

Uni-X: Mitigating Modality Conflict with a Two-End-Separated Architecture for Unified Multimodal Models

Researchers introduce Uni-X, a novel architecture for unified multimodal AI models that addresses gradient conflicts between vision and text processing. The X-shaped design uses modality-specific processing at input/output layers while sharing middle layers, achieving superior efficiency and matching 7B parameter models with only 3B parameters.

$UNI

AIBearisharXiv – CS AI · Mar 37/103

🧠

Untargeted Jailbreak Attack

Researchers have developed a new 'untargeted jailbreak attack' (UJA) that can compromise AI safety systems in large language models with over 80% success rate using only 100 optimization iterations. This gradient-based attack method expands the search space by maximizing unsafety probability without fixed target responses, outperforming existing attacks by over 30%.

AINeutralarXiv – CS AI · Jun 106/10

🧠

FedSteer: Taming Extreme Gradient Staleness in Federated Learning with Corrective Projections and Caching

FedSteer is a novel federated learning method that addresses gradient staleness in decentralized training systems where clients participate inconsistently. By projecting stale gradients onto a dynamically-maintained subspace and applying corrective techniques, the approach prevents training instability and achieves up to 7% accuracy improvements over existing baselines.

AINeutralarXiv – CS AI · Jun 26/10

🧠

On the Difficulty of Learning a Meta-network for Training Data Selection

Researchers identify critical obstacles in meta-learning for training data selection (MTS), a technique that uses bi-level optimization to weight synthetic training data. They propose solutions including increased batch sizes and novel feature engineering that collectively achieve 5.49% performance gains over unselected data.

AINeutralarXiv – CS AI · May 286/10

🧠

A Conflict-Aware Penalty and Statistical Loss Framework for Balancing Modalities and Enhancing Stability in Multimodal Sentiment Analysis

Researchers propose a Conflict-aware Penalty and Statistical Loss framework to address gradient norm conflicts in multimodal sentiment analysis, where dominant text modalities suppress weaker acoustic and visual streams. The approach achieves state-of-the-art results on CMU-MOSI benchmarks by balancing modality contributions and stabilizing training dynamics.

AINeutralarXiv – CS AI · May 286/10

🧠

SPAR: Support-Preserving Action Rectification

Researchers introduce SPAR (Support-Preserving Action Rectification), a new offline reinforcement learning method that addresses the fundamental tension between maximizing value and staying true to training data. By anchoring policy improvements to frozen behavior cloning and operating in residual space, SPAR achieves state-of-the-art results on D4RL benchmarks while maintaining data distribution fidelity.

AINeutralarXiv – CS AI · May 276/10

🧠

Not All Transitions Matter: Evidence from PPO

Researchers propose a simple technique for stabilizing reinforcement learning training in PPO algorithms by randomly dropping 25% of transitions during rollouts. The method removes gradient redundancy caused by causally-dependent state sequences, improving training consistency across multiple environments without algorithmic modifications.

AIBullisharXiv – CS AI · May 126/10

🧠

A Unified Pair-GRPO Family: From Implicit to Explicit Preference Constraints for Stable and General RL Alignment

Researchers propose Pair-GRPO, a unified theoretical framework for LLM alignment that addresses instability and interpretability issues in reinforcement learning from human preferences. The method introduces Soft-Pair-GRPO and Hard-Pair-GRPO variants with proven gradient equivalence, monotonic policy improvement, and superior performance on standard benchmarks.

AIBullisharXiv – CS AI · May 126/10

🧠

Verifier-Free RL for LLMs via Intrinsic Gradient-Norm Reward

Researchers propose VIGOR, a verifier-free reinforcement learning method for large language models that eliminates dependency on gold labels or domain-specific verifiers by using gradient-norm measurements as intrinsic reward signals. The approach demonstrates measurable improvements over existing baselines on mathematical reasoning and exhibits cross-domain transfer to code tasks, addressing a major scalability constraint in current RL-based LLM training.

AIBullisharXiv – CS AI · Mar 55/10

🧠

JPmHC Dynamical Isometry via Orthogonal Hyper-Connections

Researchers propose JPmHC (Jacobian-spectrum Preserving manifold-constrained Hyper-Connections), a new deep learning framework that improves upon existing Hyper-Connections by replacing identity skips with trainable linear mixers while controlling gradient conditioning. The framework addresses training instability and memory overhead issues in current deep learning architectures through constrained optimization on specific mathematical manifolds.

AIBullisharXiv – CS AI · Mar 36/108

🧠

Reasoning as Gradient: Scaling MLE Agents Beyond Tree Search

Researchers introduced GOME, an AI agent that uses gradient-based optimization instead of tree search for machine learning engineering tasks, achieving 35.1% success rate on MLE-Bench. The study shows gradient-based approaches outperform tree search as AI reasoning capabilities improve, suggesting this method will become more effective as LLMs advance.

AIBullisharXiv – CS AI · Mar 26/1010

🧠

Long Range Frequency Tuning for QML

Researchers have developed a new quantum machine learning optimization technique using ternary encodings that significantly improves frequency tuning efficiency. The method achieves 22.8% better performance than existing approaches while requiring exponentially fewer encoding gates than traditional fixed-frequency methods.

AIBullisharXiv – CS AI · Mar 25/107

🧠

FedDAG: Clustered Federated Learning via Global Data and Gradient Integration for Heterogeneous Environments

Researchers introduce FedDAG, a new clustered federated learning framework that improves AI model training across heterogeneous client environments. The system combines data and gradient similarity metrics for better client clustering and uses a dual-encoder architecture to enable knowledge sharing across clusters while maintaining specialization.