y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#sample-efficiency News & Analysis

27 articles tagged with #sample-efficiency. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

27 articles
AIBullisharXiv – CS AI · May 127/10
🧠

Latent Personality Alignment: Improving Harmlessness Without Mentioning Harms

Researchers propose Latent Personality Alignment (LPA), a novel defense mechanism for large language models that achieves adversarial robustness by training on abstract personality traits rather than harmful examples. The method requires fewer than 100 training examples while matching the performance of traditional approaches using 150,000+ harmful prompts, and demonstrates superior generalization to unseen attack vectors.

AIBullisharXiv – CS AI · May 117/10
🧠

Rubric-based On-policy Distillation

Researchers introduce ROPD, a rubric-based on-policy distillation framework that replaces teacher logits with structured semantic rubrics for model alignment. The approach achieves up to 10x better sample efficiency than logit-based methods while enabling distillation from proprietary black-box LLMs, addressing a critical scalability limitation in current model training.

AIBullisharXiv – CS AI · May 97/10
🧠

LANTERN: LLM-Augmented Neurosymbolic Transfer with Experience-Gated Reasoning Networks

Researchers introduce LANTERN, a framework that uses large language models to automatically generate task descriptions and intelligently aggregate knowledge from multiple source tasks for reinforcement learning. The system achieves 40-60% improvements in sample efficiency by adaptively weighting source policies based on task similarity and managing teacher-student knowledge transfer through uncertainty-aware gating.

AIBullisharXiv – CS AI · May 97/10
🧠

Milestone-Guided Policy Learning for Long-Horizon Language Agents

Researchers introduce BEACON, a milestone-guided policy learning framework that significantly improves training efficiency for long-horizon language agents by solving credit misattribution and sample inefficiency problems. The approach achieves 92.9% success rates on complex tasks—nearly double previous benchmarks—while improving sample utilization from 23.7% to 82.0%.

AIBullisharXiv – CS AI · May 77/10
🧠

Memory as a Markov Matrix: Sample Efficient Knowledge Expansion via Token-to-Dictionary Mapping

Researchers propose a novel framework that models language model memory as a Markov transition matrix, enabling efficient incorporation of new knowledge without catastrophic forgetting. The approach requires only linear sample complexity in the number of existing tokens and achieves zero forgetting through minimal parameter updates via an embedding-tuning algorithm.

AIBullisharXiv – CS AI · Apr 147/10
🧠

Zero-shot World Models Are Developmentally Efficient Learners

Researchers introduce Zero-shot Visual World Models (ZWM), a computational framework inspired by how young children learn physical understanding from minimal data. The approach combines sparse prediction, causal inference, and compositional reasoning to achieve data-efficient learning, demonstrating that AI systems can match child development patterns while learning from single-child observational data.

AIBullisharXiv – CS AI · Mar 97/10
🧠

COLD-Steer: Steering Large Language Models via In-Context One-step Learning Dynamics

Researchers introduce COLD-Steer, a training-free framework that enables efficient control of large language model behavior at inference time using just a few examples. The method approximates gradient descent effects without parameter updates, achieving 95% steering effectiveness while using 50 times fewer samples than existing approaches.

AIBullisharXiv – CS AI · Mar 37/103
🧠

Model Predictive Adversarial Imitation Learning for Planning from Observation

Researchers have developed a new approach called Model Predictive Adversarial Imitation Learning that combines inverse reinforcement learning with model predictive control to enable AI agents to learn from incomplete human demonstrations. The method shows significant improvements in sample efficiency, generalization, and robustness compared to traditional imitation learning approaches.

AIBullisharXiv – CS AI · Mar 37/103
🧠

Stabilizing Policy Gradients for Sample-Efficient Reinforcement Learning in LLM Reasoning

Researchers have developed Curvature-Aware Policy Optimization (CAPO), a new algorithm that improves training stability and sample efficiency for Large Language Models by up to 30x. The method uses advanced mathematical optimization techniques to identify and filter problematic training samples, requiring intervention on fewer than 8% of tokens.

AIBullisharXiv – CS AI · 4d ago6/10
🧠

Ratio-Variance Regularized Policy Optimization

Researchers introduce R²VPO, a new reinforcement learning method that replaces hard clipping mechanisms with ratio-variance regularization to improve policy optimization. Tested across large language models and robotic control tasks, the approach achieves better performance on mathematical reasoning and sample efficiency while maintaining stable learning.

$VPO
AINeutralarXiv – CS AI · May 126/10
🧠

From Passive Reuse to Active Reasoning: Grounding Large Language Models for Neuro-Symbolic Experience Replay

Researchers introduce Neuro-Symbolic Experience Replay (NSER), a framework that enhances reinforcement learning by combining Large Language Models with symbolic logic to transform passive memory buffers into active knowledge construction systems. The approach grounds LLM-generated behavioral rules into differentiable logic representations, enabling more efficient policy optimization across multiple benchmark environments.

AINeutralarXiv – CS AI · May 116/10
🧠

POETS: Uncertainty-Aware LLM Optimization via Compute-Efficient Policy Ensembles

Researchers introduce POETS, a novel framework that optimizes large language models through compute-efficient policy ensembles while quantifying uncertainty. By leveraging KL-regularized Thompson sampling and shared backbone architectures with independent LoRA branches, POETS achieves superior sample efficiency in scientific discovery tasks while reducing computational overhead compared to traditional ensemble methods.

AINeutralarXiv – CS AI · May 116/10
🧠

Direct Reasoning Optimization: Token-Level Reasoning Reflectivity Meets Rubric Gates for Unverifiable Tasks

Researchers propose Direct Reasoning Optimization (DRO), a constrained reinforcement learning framework that improves LLM training on unverifiable tasks by combining token-level reasoning rewards with rubric-based feasibility gates. The approach demonstrates faster, more sample-efficient learning across scientific, medical, legal, and financial domains.

AIBullisharXiv – CS AI · May 116/10
🧠

Goldilocks RL: Tuning Task Difficulty to Escape Sparse Rewards for Reasoning

Researchers introduce Goldilocks, a curriculum learning strategy that improves reinforcement learning efficiency for language models by having a teacher model dynamically select training questions of optimal difficulty for the student model. This addresses the sample inefficiency problem in sparse-reward RL training and demonstrates performance gains on reasoning tasks compared to standard approaches.

AINeutralarXiv – CS AI · May 96/10
🧠

Operator-Guided Invariance Learning for Continuous Reinforcement Learning

Researchers propose VPSD-RL, a reinforcement learning framework that discovers value-preserving structures in continuous control tasks using Lie-group operators and diffusion models. The method improves data efficiency and robustness by identifying nonlinear transformations that preserve optimal value functions, addressing brittleness in RL systems under environmental variability.

AINeutralarXiv – CS AI · May 76/10
🧠

Extending Differential Temporal Difference Methods for Episodic Problems

Researchers propose a generalization of differential temporal difference (TD) methods that extends their applicability from infinite-horizon to episodic reinforcement learning problems. By addressing how reward centering affects policy optimization in episodic settings, the work maintains theoretical guarantees while empirically demonstrating improved sample efficiency across multiple algorithms and environments.

AINeutralarXiv – CS AI · May 16/10
🧠

EXPO: Stable Reinforcement Learning with Expressive Policies

Researchers introduce EXPO, a reinforcement learning algorithm that trains expressive policies (like diffusion models) more efficiently by avoiding direct value optimization. The method uses a lightweight Gaussian policy to edit actions from a base policy, achieving 2-3x improvements in sample efficiency for both offline-to-online and fine-tuning scenarios.

AINeutralarXiv – CS AI · Apr 136/10
🧠

WOMBET: World Model-based Experience Transfer for Robust and Sample-efficient Reinforcement Learning

Researchers introduce WOMBET, a framework that improves reinforcement learning efficiency in robotics by generating synthetic training data from a world model in source tasks and selectively transferring it to target tasks. The approach combines offline-to-online learning with uncertainty-aware planning to reduce data collection costs while maintaining robustness.

AIBullisharXiv – CS AI · Apr 136/10
🧠

Sample-Efficient Neurosymbolic Deep Reinforcement Learning

Researchers propose a neuro-symbolic deep reinforcement learning approach that integrates logical rules and symbolic knowledge to improve sample efficiency and generalization in RL systems. The method transfers partial policies from simple tasks to complex ones, reducing training data requirements and improving performance in sparse-reward environments compared to existing baselines.

AIBullisharXiv – CS AI · Mar 176/10
🧠

XQC: Well-conditioned Optimization Accelerates Deep Reinforcement Learning

Researchers introduce XQC, a deep reinforcement learning algorithm that achieves state-of-the-art sample efficiency by optimizing the critic network's condition number through batch normalization, weight normalization, and distributional cross-entropy loss. The method outperforms existing approaches across 70 continuous control tasks while using fewer parameters.

AIBullisharXiv – CS AI · Mar 37/109
🧠

HiMAC: Hierarchical Macro-Micro Learning for Long-Horizon LLM Agents

Researchers introduce HiMAC, a hierarchical reinforcement learning framework that improves LLM agent performance on long-horizon tasks by separating macro-level planning from micro-level execution. The approach demonstrates state-of-the-art results across multiple environments, showing that structured hierarchy is more effective than simply scaling model size for complex agent tasks.

AIBullisharXiv – CS AI · Mar 37/108
🧠

Scaling Tasks, Not Samples: Mastering Humanoid Control through Multi-Task Model-Based Reinforcement Learning

Researchers propose EfficientZero-Multitask (EZ-M), a multi-task model-based reinforcement learning algorithm that scales the number of tasks rather than samples per task for robotics training. The approach achieves state-of-the-art performance on HumanoidBench with significantly higher sample efficiency by leveraging shared world models across diverse tasks.

AIBullisharXiv – CS AI · Feb 276/107
🧠

On Sample-Efficient Generalized Planning via Learned Transition Models

Researchers propose a new approach to generalized planning that learns explicit transition models rather than directly predicting action sequences. This method achieves better out-of-distribution performance with fewer training instances and smaller models compared to Transformer-based planners like PlanGPT.

AINeutralarXiv – CS AI · Mar 34/104
🧠

Sample-efficient and Scalable Exploration in Continuous-Time RL

Researchers introduce COMBRL, a new reinforcement learning algorithm designed for continuous-time systems using nonlinear ordinary differential equations. The algorithm achieves sublinear regret and better sample efficiency compared to existing methods by combining probabilistic models with uncertainty-aware exploration.

Page 1 of 2Next →