y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#gradient-methods News & Analysis

10 articles tagged with #gradient-methods. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

10 articles
AIBullisharXiv – CS AI · May 77/10
🧠

Gradients with Respect to Semantics Preserving Embeddings Tell the Uncertainty of Large Language Models

Researchers introduce SemGrad, a gradient-based uncertainty quantification method for large language models that operates in semantic space rather than parameter space, eliminating the computational overhead of sampling-based approaches. The method measures output stability under semantically equivalent input perturbations to gauge LLM confidence, addressing the critical challenge of hallucinations in free-form text generation.

AINeutralarXiv – CS AI · Jun 56/10
🧠

When Gradients Collide: Failure Modes of Multi-Objective Prompt Optimization for LLM Judges

Researchers identify critical failure modes in multi-objective prompt optimization for LLM judges, finding that jointly optimizing across multiple evaluation criteria reduces gradient task-focus by 59% and combining single-objective prompts degrades performance by 27%. The study reveals fundamental limitations in extending textual gradient methods to multi-criteria scenarios, constraining practical applications of automated LLM judge customization.

AINeutralarXiv – CS AI · Jun 26/10
🧠

OPD+: Rethinking the Advantage Design for On-Policy Distillation

Researchers propose OPD+, an improved on-policy distillation framework that corrects mathematical flaws in existing knowledge transfer methods between language models. The work proves that stop-gradient operations in current approaches produce biased reward estimates and introduces a corrected optimization framework supporting multiple f-divergence functions, with validation on reasoning and tool-use tasks.

AINeutralarXiv – CS AI · May 295/10
🧠

Behavior-Induced Mirror-Prox Temporal-Difference Learning for Faster Off-Policy Prediction

Researchers propose STHTD-MP, a new machine learning algorithm that improves off-policy prediction by using behavior-policy information to optimize the geometry of gradient temporal-difference methods. The method demonstrates faster convergence than existing approaches like GTD2-MP under certain conditions, with theoretical guarantees and empirical validation on standard benchmarks.

AINeutralarXiv – CS AI · May 126/10
🧠

Select-then-differentiate: Solving Bilevel Optimization with Manifold Lower-level Solution Sets

Researchers present HG-MS, a novel bilevel optimization method that handles cases where lower-level problems have multiple solutions along a manifold rather than a single optimum. The work provides theoretical guarantees for convergence while maintaining computational efficiency through pseudoinverse-based calculations, with practical applications demonstrated in LLM fine-tuning.

AINeutralarXiv – CS AI · May 96/10
🧠

On the optimization dynamics of RLVR: Gradient gap and step size thresholds

Researchers provide theoretical foundations for Reinforcement Learning with Verifiable Rewards (RLVR), a technique for post-training large language models using binary feedback. The analysis introduces the 'Gradient Gap' concept to explain convergence dynamics and derives critical step-size thresholds that determine whether training succeeds or fails, with implications for practical implementations like length normalization.

AINeutralarXiv – CS AI · May 96/10
🧠

Interpretability-Guided Bi-objective Optimization: Aligning Accuracy and Explainability

Researchers introduce Interpretability-Guided Bi-objective Optimization (IGBO), a framework that trains machine learning models to balance accuracy with explainability by encoding feature importance hierarchies as directed acyclic graphs and using Temporal Integrated Gradients to measure feature contributions. The approach provides statistical guarantees for model interpretability while maintaining convergence properties.

AIBullisharXiv – CS AI · Mar 176/10
🧠

From $\boldsymbol{\log\pi}$ to $\boldsymbol{\pi}$: Taming Divergence in Soft Clipping via Bilateral Decoupled Decay of Probability Gradient Weight

Researchers introduce Decoupled Gradient Policy Optimization (DGPO), a new reinforcement learning method that improves large language model training by using probability gradients instead of log-probability gradients. The technique addresses instability issues in current methods while maintaining exploration capabilities, showing superior performance across mathematical benchmarks.

AIBullisharXiv – CS AI · Mar 166/10
🧠

MetaKE: Meta-learning Aligned Knowledge Editing via Bi-level Optimization

Researchers propose MetaKE, a new framework for knowledge editing in Large Language Models that addresses the 'Semantic-Execution Disconnect' through bi-level optimization. The method treats edit targets as learnable parameters and uses a Structural Gradient Proxy to align edits with the model's feasible manifold, showing significant improvements over existing approaches.

AINeutralarXiv – CS AI · Mar 264/10
🧠

No Single Metric Tells the Whole Story: A Multi-Dimensional Evaluation Framework for Uncertainty Attributions

Researchers propose a new framework for evaluating uncertainty attribution methods in explainable AI, addressing inconsistent evaluation practices in the field. The study introduces five key properties including a new 'conveyance' metric and demonstrates that gradient-based methods outperform perturbation-based approaches across multiple evaluation criteria.