#robot-manipulation News & Analysis

15 articles tagged with #robot-manipulation. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

15 articles

AIBullisharXiv – CS AI · Jun 237/10

🧠

MemoryVAM: Integrating Memory into Video Action Model for Robot Manipulation

MemoryVAM introduces an episodic memory mechanism for video-world-model policies that enables robots to perform long-horizon manipulation tasks by retaining and leveraging historical context. The system achieves significant performance improvements on benchmark tasks and real robot experiments, addressing a fundamental limitation where short observation windows make complex manipulation non-Markovian.

AIBullisharXiv – CS AI · Jun 197/10

🧠

Human Universal Grasping

Researchers present HUG, a flow-matching AI model trained on 1M human grasping demonstrations that generates diverse, natural robot grasps from RGB-D images. The system outperforms existing baselines by 23-34% on real-world robotic grasping tasks and can be retargeted to various robot hands, advancing the generalization gap in robotic manipulation.

AIBullisharXiv – CS AI · Jun 107/10

🧠

What Matters in Orchestrating Robot Policies: A Systematic Study of Hierarchical VLA Agents

Researchers present a systematic study of hierarchical vision-language-action (Hi-VLA) systems that combine high-level language model planners with low-level robot controllers for complex manipulation tasks. The work establishes unified design principles for building these hierarchical robotic agents and demonstrates that thoughtfully designed hierarchical systems significantly outperform both flat VLA approaches and naive implementations across simulation and real-world robot experiments.

AIBullisharXiv – CS AI · Jun 57/10

🧠

HANDOFF: Humanoid Agentic Task-Space Whole-Body Control via Distilled Complementary Teachers

Researchers introduce HANDOFF, a humanoid robot whole-body controller that uses distilled multi-teacher learning to enable intuitive task planning and robust manipulation. The system demonstrates real-world feasibility on Unitree G1 robots with natural language task execution, advancing practical deployment of humanoid robots in complex environments.

AIBullisharXiv – CS AI · Jun 27/10

🧠

Continuous Reasoning for Vision-Language-Action

Researchers propose Continuous Reasoning for Vision-Language-Action (VLA), a framework that uses shared Gaussian latent representations instead of discrete tokens to enable robotic control. The approach achieves 40.4% improvement on robotic manipulation tasks, suggesting that effective AI reasoning for physical control requires verifiable, shareable internal representations rather than explicit language.

AIBullisharXiv – CS AI · May 287/10

🧠

HumanoidMimicGen: Data Generation for Loco-Manipulation via Whole-Body Planning

Researchers introduce HumanoidMimicGen, a method for automatically generating training data for humanoid robots performing complex locomotion and manipulation tasks. The approach enables imitation learning at scale without labor-intensive teleoperation, achieving 20% performance improvements over models trained solely on real-world demonstrations.

AIBullisharXiv – CS AI · May 47/10

🧠

Thinking in Text and Images: Interleaved Vision--Language Reasoning Traces for Long-Horizon Robot Manipulation

Researchers introduce Interleaved Vision-Language Reasoning (IVLR), a new AI framework that combines text and visual planning for robotic manipulation tasks. The system generates explicit reasoning traces alternating between textual subgoals and visual keyframes, achieving 95.5% success on LIBERO benchmarks and demonstrating that multimodal reasoning significantly outperforms text-only or vision-only approaches.

AIBullisharXiv – CS AI · Jun 256/10

🧠

Learning Action Priors for Cross-embodiment Robot Manipulation

Researchers propose a two-stage training framework for Vision-Language-Action (VLA) models that pretrains the action module with motion priors before multimodal alignment. This approach enables robots to learn temporal dynamics more efficiently and generalizes better across different embodiments and real-world tasks with limited data.

AINeutralarXiv – CS AI · Jun 196/10

🧠

Temporal Self-Imitation Learning

Researchers introduce Temporal Self-Imitation Learning (TSIL), a reinforcement learning framework that improves robot manipulation training by identifying and reusing efficient successful trajectories as self-supervision signals. The approach outperforms traditional reward-shaping methods across 15 long-horizon tasks by leveraging temporal efficiency as an intrinsic learning signal rather than relying solely on manually engineered rewards.

AIBullisharXiv – CS AI · Jun 116/10

🧠

Making Foresight Actionable: Repurposing Representation Alignment in World Action Models

Researchers introduce AGRA, a new objective function that improves World Action Models (WAMs) for robot manipulation by aligning video diffusion features with semantic representations, solving the problem where visually plausible predictions don't translate to accurate control actions. The method enhances action decoder focus on task-relevant regions and improves robustness to task-irrelevant perturbations in both in-distribution and out-of-distribution scenarios.

AIBullisharXiv – CS AI · Jun 56/10

🧠

Sample-efficient Low-level Motion Planning for Robotic Manipulation Tasks via Zero-shot Transfer Learning

Researchers propose iCEM+TL, a framework combining the Cross-Entropy Method with transfer learning to improve robotic manipulation planning efficiency. The approach achieves up to 23% success rate improvements in complex tasks like stacking and shelf placement, with validation demonstrated on a real Franka Emika robot.

AINeutralarXiv – CS AI · Jun 56/10

🧠

TempoVLA: Learning Speed-Controllable Vision-Language-Action Policies

TempoVLA introduces a controllable speed mechanism for Vision-Language-Action robot models, enabling flexible execution from fast transit to slow precision work. The approach uses trajectory augmentation during training and conditioning mechanisms during inference, allowing a single model to dynamically adjust operational speed based on task risk levels.

AIBullisharXiv – CS AI · Jun 26/10

🧠

PaCo-VLA: Passivity-Shielded Compliance Prior for Contact-Rich Vision-Language-Action Manipulation

Researchers introduce PaCo-VLA, a safety framework that shields Vision-Language-Action AI models with passivity-based compliance controls for contact-rich robotic manipulation tasks. The system treats VLA outputs as proposals rather than direct commands, using high-frequency energy monitoring to prevent unsafe interactions while maintaining semantic understanding for tasks like connector insertion.

AINeutralarXiv – CS AI · Jun 26/10

🧠

SpeedAug: Policy Acceleration via Tempo-Enriched Policy and RL Fine-Tuning

SpeedAug is a new reinforcement learning framework that accelerates robotic policy execution by learning optimal task speeds rather than relying on conservative demonstration data. The method combines tempo-enriched policy learning with RL fine-tuning to achieve 1.8x faster real-world task throughput while maintaining success rates.

AINeutralarXiv – CS AI · May 116/10

🧠

TAVIS: A Benchmark for Egocentric Active Vision and Anticipatory Gaze in Imitation Learning

Researchers introduced TAVIS, a comprehensive benchmark for evaluating active vision in imitation learning systems where robotic policies control their own gaze during manipulation tasks. The benchmark includes evaluation protocols, a novel metric (GALT) measuring anticipatory gaze, and baseline experiments showing that active vision benefits are task-dependent rather than universally beneficial.

🏢 Hugging Face