#training-efficiency News & Analysis

119 articles tagged with #training-efficiency. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

119 articles

AIBullisharXiv – CS AI · May 277/10

🧠

Beyond Trajectory-Level Attribution: Graph-Based Credit Assignment for Agentic Reinforcement Learning

Researchers propose GraphGPO, a novel reinforcement learning method that improves credit assignment in agentic tasks by aggregating trajectories into a state-transition graph rather than relying on coarse-grained outcome-based attribution. This approach enables step-level credit recognition and achieves state-of-the-art performance on challenging benchmarks while significantly improving training efficiency.

AIBullisharXiv – CS AI · May 277/10

🧠

Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers

Researchers introduce a symmetry-compatible principle for neural network optimizer design that aligns gradient updates with the geometric properties of different parameter types. The approach yields specialized update rules for embeddings, language model heads, SwiGLU MLPs, and mixture-of-experts routers, demonstrating improved validation loss and training stability across multiple language model architectures compared to standard AdamW optimization.

AIBullisharXiv – CS AI · May 277/10

🧠

Less is More: Early Stopping Rollout for On-Policy Distillation

Researchers propose Early Stopping Rollout (ESR), a novel distillation technique that improves on-policy student model training by limiting rollout generation to initial response tokens. The method addresses "Off-policy Teacher Decay," where teachers lose effectiveness on later tokens, achieving better performance with higher GPU efficiency than standard approaches.

AIBullisharXiv – CS AI · May 277/10

🧠

Evaluating Sample Utility for Efficient Data Selection by Mimicking Model Weights

Researchers introduce the Mimic Score, a geometry-based metric for evaluating data quality in large datasets by measuring gradient alignment with pre-trained models. The proposed Grad-Mimic framework enables efficient data selection, reducing training steps for CLIP models by 20.7% and filtering datasets without expensive computations or validation sets.

AIBullisharXiv – CS AI · May 127/10

🧠

DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards

Researchers introduce DUET, a method for optimizing token allocation in reinforcement learning with verifiable rewards that jointly controls which prompts receive rollouts and how long each rollout runs. The technique achieves superior reasoning quality on math and coding benchmarks while using 50% fewer tokens than baseline methods, suggesting efficiency gains don't require sacrificing model performance.

🧠 Llama

AIBullisharXiv – CS AI · May 117/10

🧠

Efficient Data Selection for Multimodal Models via Incremental Optimization Utility

Researchers introduce One-Step-Train (OST), a new data selection framework for Large Multimodal Models that uses incremental optimization to identify high-quality training samples. The method reduces computational costs by 43% while outperforming existing approaches like LLM-as-a-Judge, demonstrating significant efficiency gains in multimodal model training.

AIBullisharXiv – CS AI · May 97/10

🧠

Rethinking Adapter Placement: A Dominant Adaptation Module Perspective

Researchers introduce DomLoRA, a parameter-efficient fine-tuning method that identifies a single 'dominant adaptation module' where most gradient energy concentrates, achieving superior performance with only 0.7% of standard LoRA's trainable parameters. The discovery reveals that optimal adapter placement is architecture-dependent but task-stable across instruction following, reasoning, and code generation applications.

AIBullisharXiv – CS AI · May 97/10

🧠

Beyond Uniform Credit Assignment: Selective Eligibility Traces for RLVR

Researchers propose Selective Eligibility Traces (S-trace), a new method for reinforcement learning that improves credit assignment in large language models by selectively identifying critical reasoning steps rather than uniformly crediting entire trajectories. The approach demonstrates performance gains of 0.49-3.16% across Qwen models while improving sample and token efficiency compared to existing critic-free algorithms.

AIBullisharXiv – CS AI · May 97/10

🧠

Leviathan: Decoupling Input and Output Representations in Language Models

Researchers introduce Leviathan, a Transformer architecture that decouples input embeddings from output projections using learned embedding vectorization (LEV), achieving 9% perplexity reduction at 1.2B parameters with minimal overhead. The approach concentrates improvements on rare tokens while requiring 2.1x fewer training tokens to match baseline performance.

🏢 Perplexity

AIBullisharXiv – CS AI · May 97/10

🧠

Internalizing Outcome Supervision into Process Supervision: A New Paradigm for Reinforcement Learning for Reasoning

Researchers propose a novel reinforcement learning framework that automatically generates process-level supervision from outcome-only feedback, eliminating the need for costly external process supervision. This approach enables fine-grained credit assignment in reasoning tasks by having models identify and learn from their own failed trajectories.

AIBullisharXiv – CS AI · May 77/10

🧠

Diffusion-Inspired Masked Fine-Tuning for Knowledge Injection in Autoregressive LLMs

Researchers demonstrate that masked fine-tuning—a demasking objective borrowed from diffusion models—significantly improves knowledge injection in autoregressive LLMs without requiring expensive paraphrase augmentation and while remaining resistant to the reversal curse. This technique closes the performance gap between autoregressive and diffusion language models, with applications extending to math tasks and large-scale knowledge-intensive benchmarks.

AIBullisharXiv – CS AI · Apr 157/10

🧠

DocSeeker: Structured Visual Reasoning with Evidence Grounding for Long Document Understanding

Researchers introduce DocSeeker, a multimodal AI system designed to improve long document understanding by implementing structured analysis, localization, and reasoning workflows. The breakthrough addresses critical limitations in existing large language models that struggle with lengthy documents due to high noise levels and weak training signals, achieving superior performance on both short and ultra-long documents.

AIBullisharXiv – CS AI · Mar 267/10

🧠

Moonwalk: Inverse-Forward Differentiation

Researchers introduce Moonwalk, a new algorithm that solves backpropagation's memory limitations by eliminating the need to store intermediate activations during neural network training. The method uses vector-inverse-Jacobian products and submersive networks to reconstruct gradients in a forward sweep, enabling training of networks more than twice as deep under the same memory constraints.

AIBullisharXiv – CS AI · Mar 167/10

🧠

ARL-Tangram: Unleash the Resource Efficiency in Agentic Reinforcement Learning

Researchers introduced ARL-Tangram, a resource management system that optimizes cloud resource allocation for agentic reinforcement learning tasks involving large language models. The system achieves up to 4.3x faster action completion times and 71.2% resource savings through action-level orchestration, and has been deployed for training MiMo series models.

AIBullisharXiv – CS AI · Mar 127/10

🧠

Mashup Learning: Faster Finetuning by Remixing Past Checkpoints

Researchers propose Mashup Learning, a method that leverages historical model checkpoints to improve AI training efficiency. The technique identifies relevant past training runs, merges them, and uses the result as initialization, achieving 0.5-5% accuracy improvements while reducing training time by up to 37%.

AIBullisharXiv – CS AI · Mar 117/10

🧠

Periodic Asynchrony: An On-Policy Approach for Accelerating LLM Reinforcement Learning

Researchers propose a new asynchronous framework for LLM reinforcement learning that separates inference and training deployment, achieving 3-5x improvement in training throughput. The approach maintains on-policy correctness while enabling concurrent inference and training through a producer-consumer pipeline architecture.

AIBullisharXiv – CS AI · Mar 47/102

🧠

Efficient Agent Training for Computer Use

Researchers introduced PC Agent-E, an efficient AI agent training framework that achieves human-like computer use with minimal human demonstration data. Starting with just 312 human-annotated trajectories and augmenting them with Claude 3.7 Sonnet synthesis, the model achieved 141% relative improvement and outperformed Claude 3.7 Sonnet by 10% on WindowsAgentArena-V2 benchmark.

AIBullisharXiv – CS AI · Mar 37/104

🧠

AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning

Researchers have developed AReaL, a new asynchronous reinforcement learning system that dramatically improves the efficiency of training large language models for reasoning tasks. The system achieves up to 2.77x training speedup compared to traditional synchronous methods by decoupling generation from training processes.

AINeutralarXiv – CS AI · Mar 37/103

🧠

On the Rate of Convergence of GD in Non-linear Neural Networks: An Adversarial Robustness Perspective

Researchers prove that gradient descent in neural networks converges to optimal robustness margins at an extremely slow rate of Θ(1/ln(t)), even in simplified two-neuron settings. This establishes the first explicit lower bound on convergence rates for robustness margins in non-linear models, revealing fundamental limitations in neural network training efficiency.

AIBullisharXiv – CS AI · Mar 37/103

🧠

RACE Attention: A Strictly Linear-Time Attention for Long-Sequence Training

Researchers introduce RACE Attention, a new linear-time alternative to traditional Softmax Attention that can process up to 75 million tokens in a single pass, compared to current GPU-optimized implementations that fail beyond 4 million tokens. The technology uses angular similarity and Gaussian random projections to achieve dramatic efficiency gains while maintaining performance across language modeling and classification tasks.

AIBullisharXiv – CS AI · Mar 37/103

🧠

ExGRPO: Learning to Reason from Experience

Researchers introduce ExGRPO, a new framework that improves AI reasoning by reusing and prioritizing valuable training experiences based on correctness and entropy. The method shows consistent performance gains of +3.5-7.6 points over standard approaches across multiple model sizes while providing more stable training.

AIBullisharXiv – CS AI · Mar 37/104

🧠

Scaling with Collapse: Efficient and Predictable Training of LLM Families

Researchers demonstrate that training loss curves for large language models can collapse onto universal trajectories when hyperparameters are optimally set, enabling more efficient LLM training. They introduce Celerity, a competitive LLM family developed using these insights, and show that deviation from collapse can serve as an early diagnostic for training issues.

AIBullisharXiv – CS AI · Feb 277/108

🧠

FlashOptim: Optimizers for Memory Efficient Training

FlashOptim introduces memory optimization techniques that reduce AI training memory requirements by over 50% per parameter while maintaining model quality. The suite reduces AdamW memory usage from 16 bytes to 7 bytes per parameter through improved master weight splitting and 8-bit optimizer state quantization.

AIBullishMIT News – AI · Feb 267/107

🧠

New method could increase LLM training efficiency

Researchers have developed a new method that can double the speed of large language model training by utilizing idle computing time while maintaining accuracy. This breakthrough could significantly reduce the computational costs and time required for AI model development.

AIBullisharXiv – CS AI · Jun 256/10

🧠

FORCE: Efficient VLA Reinforcement Fine-Tuning via Value-Calibrated Warm-up and Self-Distillation

Researchers introduce FORCE, a three-stage reinforcement learning framework that significantly improves the efficiency of fine-tuning Vision-Language-Action models for robotics. By addressing Q-function instability and low-quality exploration data, FORCE achieves 79% absolute improvement in success rates while reducing training time by 32.5%, eliminating the need for human intervention during deployment.

← PrevPage 2 of 5Next →