#policy-learning News & Analysis

45 articles tagged with #policy-learning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

45 articles

AIBullisharXiv – CS AI · Jun 257/10

🧠

ACT-JEPA: Novel Joint-Embedding Predictive Architecture for Efficient Policy Representation Learning

Researchers introduce ACT-JEPA, a machine learning architecture that combines imitation learning with self-supervised learning to improve policy representation in AI decision-making systems. The model achieves up to 40% improvement in world model understanding and 10% higher task success rates by jointly predicting action and latent observation sequences in latent space rather than raw input.

AIBullisharXiv – CS AI · Jun 237/10

🧠

MotionPyramid: Hierarchical Motion Representation and Residual Interfaces

MotionPyramid introduces a hierarchical action representation for humanoid control that learns motion structure from data, organizing behaviors across temporal scales from immediate motor commands to complex skills. The system uses frozen pretrained hierarchies as reusable action interfaces for reinforcement learning, with residual interfaces allowing policies to blend coarse and fine-grained control, demonstrating that motion can be organized like perceptual hierarchies.

AIBullisharXiv – CS AI · Jun 237/10

🧠

MemoryVAM: Integrating Memory into Video Action Model for Robot Manipulation

MemoryVAM introduces an episodic memory mechanism for video-world-model policies that enables robots to perform long-horizon manipulation tasks by retaining and leveraging historical context. The system achieves significant performance improvements on benchmark tasks and real robot experiments, addressing a fundamental limitation where short observation windows make complex manipulation non-Markovian.

AIBullisharXiv – CS AI · Jun 117/10

🧠

FACTR 2: Learning External Force Sensing for Commodity Robot Arms Improves Policy Learning

Researchers introduce NEXT, a neural network method that estimates external joint torques on robot arms without dedicated force sensors, paired with FIRST, a training technique that improves policy learning by 17% across long-horizon tasks. This breakthrough enables cost-effective force-aware teleoperation and manipulation on commodity robots by leveraging only 10 minutes of free-motion calibration data.

AIBullisharXiv – CS AI · Jun 97/10

🧠

EgoAERO: Learning Dexterous Manipulation from a Single Egocentric Video without Object Assets

EgoAERO introduces a framework enabling robots to learn dexterous manipulation skills from single egocentric human videos without requiring pre-scanned object assets or CAD models. The system reconstructs hand-object trajectories and converts them into robot policies, supported by a new large-scale dataset (EgoDex-R) containing 4.3M RGB-D frames, achieving performance comparable to traditional asset-dependent methods.

AIBullisharXiv – CS AI · Jun 97/10

🧠

ActProbe: Action-Space Probe for Early Failure Detection of Generative Robot Policies

Researchers introduce ActProbe, a lightweight failure detection system for generative robot policies that analyzes action signals to predict failures before they occur. The method improves failure detection accuracy by 12.7% over existing approaches and demonstrates real-world effectiveness on robot manipulation tasks.

AIBullisharXiv – CS AI · Jun 57/10

🧠

Is Diversity All You Need for Scalable Robotic Manipulation?

Researchers challenge the 'more diversity is better' paradigm in robotic manipulation by demonstrating that task diversity matters more than data quantity, single-embodiment pre-training transfers effectively across platforms, and expert diversity can actually harm learning due to velocity multimodality. Their distribution debiasing method achieves 15% performance gains equivalent to 2.5x more pre-training data.

AIBullisharXiv – CS AI · Jun 47/10

🧠

DiffAero: A GPU-Accelerated Differentiable Simulation Framework for Efficient Quadrotor Policy Learning

DiffAero is a GPU-accelerated simulation framework that enables efficient quadrotor control policy learning through fully differentiable physics and rendering. The framework demonstrates significant performance improvements over existing simulators, achieving robust flight policy training on consumer hardware in hours rather than days, with code publicly available for research adoption.

AIBullisharXiv – CS AI · Jun 47/10

🧠

PerceptTwin: Semantic Scene Reconstruction for Iterative LLM Planning and Verification

PerceptTwin is an automated pipeline that generates interactive 3D simulations from robot perception data, enabling LLM-based planners to validate and refine strategies before hardware execution. The system improves plan success rates by approximately 39% and enhances safety through semantic scene reconstruction and LLM verification mechanisms.

🧠 GPT-5

AIBullisharXiv – CS AI · May 297/10

🧠

Offline Reinforcement Learning with Generative Trajectory Policies

Researchers propose Generative Trajectory Policies (GTPs), a unified framework for offline reinforcement learning that bridges the performance gap between slow diffusion models and fast consistency policies by learning continuous-time generative trajectories. The approach achieves state-of-the-art results on D4RL benchmarks, including perfect scores on difficult AntMaze tasks.

AIBullisharXiv – CS AI · May 297/10

🧠

HumanEgo: Zero-Shot Robot Learning from Minutes of Human Egocentric Videos

HumanEgo is a new AI framework that enables robots to learn manipulation tasks directly from human egocentric videos without requiring robot-specific training data. The system achieves 92.5% success on real-world tasks using just 30 minutes of human video per task and transfers zero-shot across different robot hardware, cameras, and environments.

AIBullisharXiv – CS AI · May 277/10

🧠

FineVLA: Fine-Grained Instruction Alignment for Steerable Vision-Language-Action Policies

Researchers introduce FineVLA, a framework that enhances Vision-Language-Action models for robotics by incorporating fine-grained instruction supervision beyond simple goal-level commands. The system combines 972,247 trajectories into a curated dataset of 47,159 fine-grained trajectories and demonstrates that mixing fine-grained and coarse instructions improves real-world robot manipulation success rates to 62.7% compared to 49.9% with goal-level instructions alone.

AIBullisharXiv – CS AI · May 127/10

🧠

NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized Research Automation

NanoResearch introduces a multi-agent LLM framework that personalizes research automation through three co-evolving components: a skill bank for reusable procedural knowledge, a memory module for user-specific experience, and label-free policy learning for preference internalization. The system addresses the gap between uniform AI outputs and diverse researcher needs, demonstrating substantial improvements over existing AI research systems while reducing costs across successive cycles.

AIBullisharXiv – CS AI · May 117/10

🧠

Learning and Reusing Policy Decompositions for Hierarchical Generalized Planning with LLM Agents

Researchers introduce HCL-GP, a machine learning approach that enables large language model agents to learn and reuse hierarchical task decompositions for improved performance on complex applications. The method achieves 98.2% accuracy on standard tasks and demonstrates significant improvements over static synthesis approaches, particularly benefiting open-source models through dynamic component reuse.

AIBullisharXiv – CS AI · May 97/10

🧠

Milestone-Guided Policy Learning for Long-Horizon Language Agents

Researchers introduce BEACON, a milestone-guided policy learning framework that significantly improves training efficiency for long-horizon language agents by solving credit misattribution and sample inefficiency problems. The approach achieves 92.9% success rates on complex tasks—nearly double previous benchmarks—while improving sample utilization from 23.7% to 82.0%.

AIBullisharXiv – CS AI · May 77/10

🧠

When Life Gives You BC, Make Q-functions: Extracting Q-values from Behavior Cloning for On-Robot Reinforcement Learning

Researchers introduce Q2RL, a novel algorithm that combines behavior cloning with reinforcement learning to enable robots to improve their policies through online interaction. The method uses Q-value estimation and gating mechanisms to prevent policy degradation from distribution mismatch, achieving 100% success rates on complex manipulation tasks in 1-2 hours of real robot learning.

AIBullisharXiv – CS AI · Mar 97/10

🧠

Traversal-as-Policy: Log-Distilled Gated Behavior Trees as Externalized, Verifiable Policies for Safe, Robust, and Efficient Agents

Researchers propose Traversal-as-Policy, a method that distills AI agent execution logs into Gated Behavior Trees (GBTs) to create safer, more efficient autonomous agents. The approach significantly improves success rates while reducing safety violations and computational costs across multiple benchmarks.

AIBullisharXiv – CS AI · Mar 57/10

🧠

VITA: Vision-to-Action Flow Matching Policy

Researchers developed VITA, a new AI framework that streamlines robot policy learning by directly flowing from visual inputs to actions without requiring conditioning modules. The system achieves 1.5-2x faster inference speeds while maintaining or improving performance compared to existing methods across 14 simulation and real-world robotic tasks.

AIBullisharXiv – CS AI · Mar 37/104

🧠

Dense-Jump Flow Matching with Non-Uniform Time Scheduling for Robotic Policies: Mitigating Multi-Step Inference Degradation

Researchers developed a new robotic policy framework using dense-jump flow matching with non-uniform time scheduling to address performance degradation in multi-step inference. The approach achieves up to 23.7% performance gains over existing baselines by optimizing integration scheduling during training and inference phases.

AIBullisharXiv – CS AI · Jun 256/10

🧠

Learning Action Priors for Cross-embodiment Robot Manipulation

Researchers propose a two-stage training framework for Vision-Language-Action (VLA) models that pretrains the action module with motion priors before multimodal alignment. This approach enables robots to learn temporal dynamics more efficiently and generalizes better across different embodiments and real-world tasks with limited data.

AINeutralarXiv – CS AI · Jun 236/10

🧠

SQLConductor: Search-to-Policy Learning for Step-wise Text-to-SQL Orchestration

SQLConductor is a new AI framework that improves Text-to-SQL systems—tools that convert natural language queries into database commands—by using adaptive, step-wise orchestration rather than fixed pipelines. The system achieves 73.2% execution accuracy on complex database queries while using smaller, frozen models, suggesting significant efficiency gains for database accessibility applications.

AINeutralarXiv – CS AI · Jun 236/10

🧠

DASIP: Dynamic Test-Time Compute Scaling for Robot Control with Stochastic Interpolant Policies

Researchers introduce DA-SIP, a dynamic inference framework for robotic control that adaptively adjusts computational resources based on task difficulty. The approach reduces inference time by 2.6-4.4x while maintaining performance, addressing the computational inefficiency of fixed-budget diffusion and flow-based policies in robotics.

AINeutralarXiv – CS AI · Jun 196/10

🧠

Automating SKILL.md Generation for Computer-Using Agents via Interaction Trajectory Mining

Researchers developed a three-stage pipeline to automatically extract skill libraries from computer-using agent interaction data, achieving high readability (95% purity on labeled benchmarks) but failing to improve downstream policy performance across domains. The study reveals that while trajectory mining can expose interpretable skill structure, current technical limitations prevent reliable cross-domain transfer improvements.

AINeutralarXiv – CS AI · Jun 116/10

🧠

DuoBench: A Reproducible Benchmark for Bimanual Manipulation in Simulation and the Real World

Researchers introduce DuoBench, a comprehensive benchmarking framework for evaluating bimanual robotic manipulation policies on the FR3 Duo platform. The framework includes eleven tasks implemented in simulation and real-world settings, with reproducible recipes and human-teleoperated datasets that reveal significant challenges in current dual-arm AI policies, particularly in coordination and sim-to-real transfer.

AINeutralarXiv – CS AI · Jun 116/10

🧠

Implicit Neural Representations of Individual Behavior

Researchers introduce Behavioral INR, a self-supervised machine learning model that learns to identify and represent different behavioral policies from unlabeled multi-policy data by adapting implicit neural representations from computer vision. The approach shows promise in robotics, gaming, and racing datasets where mixed behaviors lack annotations, particularly excelling in continuous state-action environments with variable episode lengths.

Page 1 of 2Next →