#training-efficiency News & Analysis

119 articles tagged with #training-efficiency. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

119 articles

AINeutralarXiv – CS AI · Jun 256/10

🧠

Transferability for General Reasoning: An Automated Curriculum for Multi-Domain RLVR

Researchers propose Transfer-Aware Curriculum (TAC), a machine learning optimization technique that dynamically adjusts training priorities across multiple domains by measuring how well improvements in one area transfer to others. The method achieves superior performance on reasoning tasks compared to fixed curricula, suggesting that cross-domain transferability is a critical factor for training more capable AI systems.

🧠 Llama

AIBullisharXiv – CS AI · Jun 256/10

🧠

FORCE: Efficient VLA Reinforcement Fine-Tuning via Value-Calibrated Warm-up and Self-Distillation

Researchers introduce FORCE, a three-stage reinforcement learning framework that significantly improves the efficiency of fine-tuning Vision-Language-Action models for robotics. By addressing Q-function instability and low-quality exploration data, FORCE achieves 79% absolute improvement in success rates while reducing training time by 32.5%, eliminating the need for human intervention during deployment.

AINeutralarXiv – CS AI · Jun 236/10

🧠

On the Position Bias of On-Policy Distillation

Researchers discover that On-Policy Distillation (OPD) in reinforcement learning suffers from position bias, where later tokens in sequences receive degraded supervision as student rollouts deviate from teacher distributions. They propose Importance-Weighted OPD (IW-OPD), which adaptively reweights tokens based on accumulated distribution discrepancy, achieving up to 6.9-point improvements on benchmark tasks.

AIBullisharXiv – CS AI · Jun 236/10

🧠

DataClaw0: Agentic Tailoring Multimodal Data from Raw Streams

Researchers introduce DataClaw0, an AI system that actively refines and structures unstructured multimodal data streams to align with specific user and downstream task intents. The 9B-parameter model uses a two-stage pipeline combining supervised fine-tuning with reinforcement learning, validated through a new benchmark and demonstrated improvements in video generation, VQA, and GUI navigation tasks.

AIBullisharXiv – CS AI · Jun 196/10

🧠

Beyond Entropy: Learning from Token-Level Distributional Deviations for LLM Reasoning

Researchers introduce the Independent Combinatorial Tokens (ICT) framework to improve Large Language Model reasoning by addressing entropy collapse and explosion problems in reinforcement learning. Using Jensen-Shannon divergence to identify critical token branching points, ICT achieves 4.58% average improvement in pass@4 scores across math, commonsense, and Olympiad benchmarks on Qwen models.

AINeutralarXiv – CS AI · Jun 196/10

🧠

Manifold Bandits: Bayesian Curriculum Learning over the Latent Geometry of Large Language Models

Researchers propose Bayesian Manifold Curriculum (BMC), a new framework for training large language models through reinforcement learning that treats problem sampling as a structured bandit problem rather than independent tasks. The approach organizes problems hierarchically and balances difficulty, diversity, and task relevance, showing that difficulty alone is insufficient for optimal model improvement.

AINeutralarXiv – CS AI · Jun 116/10

🧠

Using Explainability as a Training-Time Reliability Signal for Efficient ECG Classification

Researchers introduce ERTS, an explainability-based training method that reduces computational costs for ECG classification by using attention map quality to identify which training samples are genuinely informative versus noisy. The approach demonstrates consistent performance improvements across multiple datasets while significantly lowering training expenses, offering practical efficiency gains for resource-constrained healthcare environments.

AIBullisharXiv – CS AI · Jun 96/10

🧠

SAW: Stage-Aware Dynamic Weighting for Multi-Objective Reinforcement Learning in Large Language Models

Researchers introduce Stage-Aware Dynamic Weighting (SAW), a novel mechanism for multi-objective reinforcement learning in large language models that addresses the asynchronous nature of reward learning across different objectives. By using coefficient of variation as a real-time informativeness proxy, SAW dynamically reweights objective contributions to improve training efficiency and final performance with minimal computational overhead.

AINeutralarXiv – CS AI · Jun 96/10

🧠

An Agency-Transferring Model-Free Policy Enhancement Technique

Researchers propose a reinforcement learning technique that accelerates policy training by gradually transferring control from a baseline policy to a learnable policy, achieving faster convergence and superior performance compared to training from scratch while maintaining high success rates throughout the learning process.

AIBullisharXiv – CS AI · Jun 96/10

🧠

ePC: Fast and Deep Predictive Coding in Digital Simulation

Researchers have reformulated Predictive Coding (PC), a brain-inspired neural network training method, to address its severe computational inefficiency in digital systems. The new error-based PC (ePC) eliminates signal decay problems inherent in the canonical state-based formulation, achieving backpropagation-level performance at orders of magnitude faster speeds, enabling PC to scale to deeper architectures on standard hardware.

AIBullisharXiv – CS AI · Jun 86/10

🧠

WAV: Multi-Resolution Block Residual Routing for Deep Decoder-Only Transformers

Researchers introduce WAV v1, a multi-resolution residual routing technique that improves deep transformer training by capturing directional detail in residual connections beyond simple block summaries. The method shows significant performance gains at 48-layer depths, reducing validation loss by 2.2% on TinyStories and 0.6% on Text8 with minimal parameter overhead.

AINeutralarXiv – CS AI · Jun 56/10

🧠

TAPO: Tool-Aware Policy Optimization via Credit Transfer for Multimodal Search Agents

Researchers propose TAPO (Tool-Aware Policy Optimization), a method that fixes credit misassignment problems in reinforcement learning for multimodal search agents. The technique improves training efficiency for AI systems that use tools, delivering consistent improvements across multiple benchmarks without requiring additional annotations or computational overhead.

AIBullisharXiv – CS AI · Jun 56/10

🧠

Selective-Advantage Entropy-Adaptive Horizon GRPO: Asymmetric Token-Level Discounting for Efficient Reinforcement Learning of Language Models

Researchers introduce Selective-Advantage Adaptive-Horizon GRPO (SA-AH-GRPO), an improved reinforcement learning algorithm for language models that applies asymmetric token-level discounting to stabilize training on reasoning tasks. The method achieves 3.6x reduction in training variance while maintaining peak performance on mathematical reasoning benchmarks, demonstrating more efficient model alignment without sacrificing accuracy.

AINeutralarXiv – CS AI · Jun 56/10

🧠

RREDCoT: Segment-Level Reward Redistribution for Reasoning Models

Researchers introduce RREDCoT, a novel method for improving reasoning language models by redistributing rewards at the segment level during reinforcement learning training. The approach addresses the high variance problem inherent in current Chain-of-Thought optimization methods by using the model itself to estimate which parts of reasoning traces deserve higher rewards, without requiring expensive additional computation.

AINeutralarXiv – CS AI · Jun 56/10

🧠

Semi-Offline Reinforcement Learning for Optimized Text Generation

Researchers propose semi-offline reinforcement learning, a novel paradigm that bridges online and offline RL approaches to optimize text generation. The method balances exploration costs with training efficiency while providing theoretical frameworks for comparing different RL settings, demonstrating comparable or superior performance to existing state-of-the-art methods.

AINeutralarXiv – CS AI · Jun 46/10

🧠

Arithmetic Pedagogy for Language Models

Researchers trained a small 86M-parameter language model on Indonesian arithmetic using pedagogically-grounded Chain-of-Thought supervision based on the GASING method, achieving over 80% accuracy on held-out problems. The model developed both procedural reasoning and mental-arithmetic capabilities without reinforcement learning, demonstrating that human teaching methods can guide efficient AI training for mathematical reasoning.

AINeutralarXiv – CS AI · Jun 46/10

🧠

Why Muon Outperforms Adam: A Curvature Perspective

Researchers demonstrate that Muon, an optimizer for large language model training, outperforms Adam by approximately 2x efficiency through lower Normalized Directional Sharpness (NDS) rather than smaller update scales. Using curvature analysis and stylized quadratic problems, the work reveals that Muon's advantage stems from better balancing of update energy across heterogeneous curvature regions, with benefits amplified in data-imbalanced scenarios.