#robot-learning News & Analysis

32 articles tagged with #robot-learning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

32 articles

AIBullisharXiv – CS AI · Jun 237/10

🧠

KITE: Decoupling Kinematics and Interaction for Zero-Shot Cross-Embodiment Manipulation

Researchers introduce KITE, a machine learning framework that decouples task reasoning from embodiment-specific motor control to enable robot manipulation policies trained on one robot type to transfer zero-shot to structurally different robots. The approach uses learned latent representations of interaction intent based on contact patterns, requiring only kinematic model training for new embodiments without collecting new demonstration data.

AIBullisharXiv – CS AI · Jun 117/10

🧠

LUCID: Learning Embodiment-Agnostic Intent Models from Unstructured Human Videos for Scalable Dexterous Robot Skill Acquisition

LUCID is a machine learning framework that learns robot manipulation skills from unstructured internet videos and human demonstrations, then transfers this knowledge to different robot embodiments through a shared intent model. The approach eliminates the need for expensive, embodiment-specific robot training data and demonstrates zero-shot transfer capabilities across multiple real-world tasks.

AIBullisharXiv – CS AI · Jun 117/10

🧠

FACTR 2: Learning External Force Sensing for Commodity Robot Arms Improves Policy Learning

Researchers introduce NEXT, a neural network method that estimates external joint torques on robot arms without dedicated force sensors, paired with FIRST, a training technique that improves policy learning by 17% across long-horizon tasks. This breakthrough enables cost-effective force-aware teleoperation and manipulation on commodity robots by leveraging only 10 minutes of free-motion calibration data.

AIBullisharXiv – CS AI · Jun 107/10

🧠

UniDexTok: A Unified Dexterous Hand Tokenizer from Real Data

UniDexTok introduces a unified tokenization system that standardizes how different dexterous robotic hands represent their states, enabling cross-embodiment learning from real-world data. By mapping diverse hand kinematics to a shared 22-degree-of-freedom interface, the system achieves sub-millimeter reconstruction accuracy—a 99% improvement over previous approaches—while eliminating the need for simulation or manual retargeting.

AIBullisharXiv – CS AI · Jun 97/10

🧠

EgoAERO: Learning Dexterous Manipulation from a Single Egocentric Video without Object Assets

EgoAERO introduces a framework enabling robots to learn dexterous manipulation skills from single egocentric human videos without requiring pre-scanned object assets or CAD models. The system reconstructs hand-object trajectories and converts them into robot policies, supported by a new large-scale dataset (EgoDex-R) containing 4.3M RGB-D frames, achieving performance comparable to traditional asset-dependent methods.

AIBullisharXiv – CS AI · Jun 97/10

🧠

HARBOR: A Harness Framework for Agentic Robot Reinforcement Learning

HARBOR is an automated framework that uses specialized AI agents to streamline reinforcement learning workflows for robot training, eliminating manual environment setup, reward shaping, and hyperparameter tuning. Demonstrated across 16 robotic tasks, the system reduces engineering effort while maintaining competitive performance and enabling real-world robot deployment.

AIBearisharXiv – CS AI · Jun 97/10

🧠

Targeting World Models to Compromise Robot Learning Pipelines

Researchers demonstrate a novel data poisoning attack targeting world models used in robot learning pipelines, showing how malicious prompts or dynamics hidden in training data can be activated only when processed through world models to generate unsafe robotic policies. The attack bypasses traditional safety measures by appearing benign in ground truth datasets while compromising downstream robot learning systems, affecting both action-conditioned and text-conditioned models.

AIBullisharXiv – CS AI · Jun 57/10

🧠

World-Language-Action Model for Unified World Modeling, Language Reasoning, and Action Synthesis

Researchers introduce World-Language-Action (WLA) models, a new class of embodied foundation models that combine world modeling, language reasoning, and action synthesis for robotic control. The WLA-0 prototype demonstrates state-of-the-art performance across multiple benchmarks, achieving 92.94% success on RoboTwin2.0 and 56.5% on RMBench while running at 40ms inference on consumer GPU hardware.

🏢 Nvidia

AIBullisharXiv – CS AI · Jun 57/10

🧠

Is Diversity All You Need for Scalable Robotic Manipulation?

Researchers challenge the 'more diversity is better' paradigm in robotic manipulation by demonstrating that task diversity matters more than data quantity, single-embodiment pre-training transfers effectively across platforms, and expert diversity can actually harm learning due to velocity multimodality. Their distribution debiasing method achieves 15% performance gains equivalent to 2.5x more pre-training data.

AIBullisharXiv – CS AI · Jun 47/10

🧠

VISTA: Vision-Grounded and Physics-Validated Adaptation of UMI data for VLA Training

VISTA is a new framework that improves robot learning by adapting real-world manipulation data collected via Universal Manipulation Interface (UMI) for training Vision-Language-Action (VLA) models. The framework addresses two key challenges: making distorted wrist-mounted camera views compatible with pre-trained vision models and filtering out physically infeasible trajectories before training, resulting in significantly better policy performance.

AIBullisharXiv – CS AI · Jun 27/10

🧠

From Human Videos to Robot Manipulation: A Survey on Scalable Vision-Language-Action Learning with Human-Centric Data

A comprehensive survey examines how human videos can be leveraged to train Vision-Language-Action (VLA) models for robot manipulation, addressing the limitation that robot demonstrations are expensive and embodiment-specific. The research categorizes four approaches for extracting actionable knowledge from human videos and identifies critical open challenges in video structuring, embodiment transfer, and real-world evaluation.

AINeutralarXiv – CS AI · May 297/10

🧠

MiraBench: Evaluating Action-Conditioned Reliability in Robotic World Models

MiraBench introduces a new evaluation framework for robotic world models that prioritizes action-conditioned reliability over visual fidelity. The benchmark reveals that current AI models struggle to faithfully follow commanded actions and exhibit persistent optimism bias when predicting outcomes of failure-inducing actions.

$OP

AIBullisharXiv – CS AI · May 297/10

🧠

HumanEgo: Zero-Shot Robot Learning from Minutes of Human Egocentric Videos

HumanEgo is a new AI framework that enables robots to learn manipulation tasks directly from human egocentric videos without requiring robot-specific training data. The system achieves 92.5% success on real-world tasks using just 30 minutes of human video per task and transfers zero-shot across different robot hardware, cameras, and environments.

AIBullisharXiv – CS AI · May 287/10

🧠

Turning Video Models into Generalist Robot Policies

Researchers present VERA, a decoupled approach to robot control that separates video prediction from action execution using inverse dynamics models. Rather than fine-tuning video models with action labels, the method keeps the video planner unchanged and trains embodiment-specific models to translate predicted frames into robot actions, enabling zero-shot cross-embodiment generalization.

AIBullisharXiv – CS AI · May 277/10

🧠

FineVLA: Fine-Grained Instruction Alignment for Steerable Vision-Language-Action Policies

Researchers introduce FineVLA, a framework that enhances Vision-Language-Action models for robotics by incorporating fine-grained instruction supervision beyond simple goal-level commands. The system combines 972,247 trajectories into a curated dataset of 47,159 fine-grained trajectories and demonstrates that mixing fine-grained and coarse instructions improves real-world robot manipulation success rates to 62.7% compared to 49.9% with goal-level instructions alone.

AIBullisharXiv – CS AI · May 117/10

🧠

Sword: Style-Robust World Models as Simulators via Dynamic Latent Bootstrapping for VLA Policy Post-Training

Researchers introduce Sword, a world model framework that improves Vision-Language-Action (VLA) models' ability to simulate environments for policy training. By addressing visual style sensitivity and error accumulation in long-horizon predictions, Sword demonstrates significant performance gains on the LIBERO benchmark, advancing the feasibility of training AI agents entirely within simulated environments.

AIBullisharXiv – CS AI · May 97/10

🧠

EA-WM: Event-Aware Generative World Model with Structured Kinematic-to-Visual Action Fields

Researchers introduce EA-WM, an event-aware generative world model that bridges kinematic control and visual perception for robotic systems. By projecting robot actions directly into camera views as structured kinematic-to-visual action fields rather than abstract tokens, the model achieves state-of-the-art performance on the WorldArena benchmark, significantly advancing robot learning and simulation capabilities.

AIBullisharXiv – CS AI · May 77/10

🧠

When Life Gives You BC, Make Q-functions: Extracting Q-values from Behavior Cloning for On-Robot Reinforcement Learning

Researchers introduce Q2RL, a novel algorithm that combines behavior cloning with reinforcement learning to enable robots to improve their policies through online interaction. The method uses Q-value estimation and gating mechanisms to prevent policy degradation from distribution mismatch, achieving 100% success rates on complex manipulation tasks in 1-2 hours of real robot learning.

AIBullisharXiv – CS AI · Mar 57/10

🧠

RoboCasa365: A Large-Scale Simulation Framework for Training and Benchmarking Generalist Robots

Researchers have released RoboCasa365, a large-scale simulation benchmark featuring 365 household tasks across 2,500 kitchen environments with over 600 hours of human demonstration data. The platform is designed to train and evaluate generalist robots for everyday tasks, providing insights into factors affecting robot performance and generalization capabilities.

AIBullisharXiv – CS AI · Mar 37/103

🧠

Robometer: Scaling General-Purpose Robotic Reward Models via Trajectory Comparisons

Researchers introduce Robometer, a new framework for training robot reward models that combines progress tracking with trajectory comparisons to better learn from failed attempts. The system is trained on RBM-1M, a dataset of over one million robot trajectories including failures, and shows improved performance across diverse robotics applications.

AINeutralarXiv – CS AI · Jun 236/10

🧠

PoLAR: Factorizing Extent and Mode in Latent Actions for Robot Policy Learning

Researchers introduce PoLAR, a novel latent action representation framework that uses radial-direction structure in hyperbolic space to separately encode transition extent and mode for robot policy learning. The method improves downstream performance across simulation and real-world experiments by leveraging temporal gaps as a proxy for transition magnitude, outperforming existing latent action baselines and vision-language models.

AINeutralarXiv – CS AI · Jun 236/10

🧠

RARM: Confidence-Gated Progress Reward Modeling for RL in Manipulation

Researchers introduce RARM (Reference-Anchored Reward Model), a visual AI system that solves a major bottleneck in robot learning by converting single successful demonstrations into dense reward signals without task-specific engineering. The approach uses confidence-gated progress matching to avoid false-positive rewards, achieving superior performance across simulated and real-world manipulation tasks.

AINeutralarXiv – CS AI · Jun 236/10

🧠

RECALL: Recovery Experience Collection for Active Lifelong Learning in Vision-Language-Action Models

Researchers propose RECALL, an active learning framework for Vision-Language-Action (VLA) models that uses uncertainty-guided data collection to improve robot learning efficiency. While targeted recovery demonstrations outperform passive imitation learning, the approach reveals critical challenges with catastrophic forgetting when new data isn't balanced with retention mechanisms.

AINeutralarXiv – CS AI · Jun 236/10

🧠

CoorDex: Coordinating Body and Hand Priors for Continuous Dexterous Humanoid Loco-Manipulation

Researchers introduce CoorDex, a learning pipeline that enables humanoid robots to perform complex dexterous manipulation tasks while continuously moving, rather than stopping to grasp objects. The system coordinates high-dimensional body and hand control through latent priors and residual reinforcement learning, demonstrated on a Unitree G1 humanoid with a 20-DOF hand performing tasks like in-motion bottle grasping and fridge operation.

AINeutralarXiv – CS AI · Jun 196/10

🧠

Physical Atari: A Robust and Accessible Platform for Real-time Reinforcement Learning on Robots

Researchers developed Physical Atari, an affordable robotic system that applies reinforcement learning algorithms to physical Atari game controllers in real-world conditions. Built for under $1,000 using consumer-grade components and 3D-printed parts, the system has demonstrated weeks of continuous operation while revealing significant performance degradation from even minor distribution shifts between training and deployment environments.

Page 1 of 2Next →