y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#robot-manipulation News & Analysis

11 articles tagged with #robot-manipulation. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

11 articles
AIBullisharXiv – CS AI · Jun 107/10
🧠

What Matters in Orchestrating Robot Policies: A Systematic Study of Hierarchical VLA Agents

Researchers present a systematic study of hierarchical vision-language-action (Hi-VLA) systems that combine high-level language model planners with low-level robot controllers for complex manipulation tasks. The work establishes unified design principles for building these hierarchical robotic agents and demonstrates that thoughtfully designed hierarchical systems significantly outperform both flat VLA approaches and naive implementations across simulation and real-world robot experiments.

AIBullisharXiv – CS AI · Jun 57/10
🧠

HANDOFF: Humanoid Agentic Task-Space Whole-Body Control via Distilled Complementary Teachers

Researchers introduce HANDOFF, a humanoid robot whole-body controller that uses distilled multi-teacher learning to enable intuitive task planning and robust manipulation. The system demonstrates real-world feasibility on Unitree G1 robots with natural language task execution, advancing practical deployment of humanoid robots in complex environments.

AIBullisharXiv – CS AI · Jun 27/10
🧠

Continuous Reasoning for Vision-Language-Action

Researchers propose Continuous Reasoning for Vision-Language-Action (VLA), a framework that uses shared Gaussian latent representations instead of discrete tokens to enable robotic control. The approach achieves 40.4% improvement on robotic manipulation tasks, suggesting that effective AI reasoning for physical control requires verifiable, shareable internal representations rather than explicit language.

AIBullisharXiv – CS AI · May 287/10
🧠

HumanoidMimicGen: Data Generation for Loco-Manipulation via Whole-Body Planning

Researchers introduce HumanoidMimicGen, a method for automatically generating training data for humanoid robots performing complex locomotion and manipulation tasks. The approach enables imitation learning at scale without labor-intensive teleoperation, achieving 20% performance improvements over models trained solely on real-world demonstrations.

AIBullisharXiv – CS AI · May 47/10
🧠

Thinking in Text and Images: Interleaved Vision--Language Reasoning Traces for Long-Horizon Robot Manipulation

Researchers introduce Interleaved Vision-Language Reasoning (IVLR), a new AI framework that combines text and visual planning for robotic manipulation tasks. The system generates explicit reasoning traces alternating between textual subgoals and visual keyframes, achieving 95.5% success on LIBERO benchmarks and demonstrating that multimodal reasoning significantly outperforms text-only or vision-only approaches.

AIBullisharXiv – CS AI · Jun 116/10
🧠

Making Foresight Actionable: Repurposing Representation Alignment in World Action Models

Researchers introduce AGRA, a new objective function that improves World Action Models (WAMs) for robot manipulation by aligning video diffusion features with semantic representations, solving the problem where visually plausible predictions don't translate to accurate control actions. The method enhances action decoder focus on task-relevant regions and improves robustness to task-irrelevant perturbations in both in-distribution and out-of-distribution scenarios.

AINeutralarXiv – CS AI · Jun 56/10
🧠

TempoVLA: Learning Speed-Controllable Vision-Language-Action Policies

TempoVLA introduces a controllable speed mechanism for Vision-Language-Action robot models, enabling flexible execution from fast transit to slow precision work. The approach uses trajectory augmentation during training and conditioning mechanisms during inference, allowing a single model to dynamically adjust operational speed based on task risk levels.

AIBullisharXiv – CS AI · Jun 26/10
🧠

PaCo-VLA: Passivity-Shielded Compliance Prior for Contact-Rich Vision-Language-Action Manipulation

Researchers introduce PaCo-VLA, a safety framework that shields Vision-Language-Action AI models with passivity-based compliance controls for contact-rich robotic manipulation tasks. The system treats VLA outputs as proposals rather than direct commands, using high-frequency energy monitoring to prevent unsafe interactions while maintaining semantic understanding for tasks like connector insertion.

AINeutralarXiv – CS AI · Jun 26/10
🧠

SpeedAug: Policy Acceleration via Tempo-Enriched Policy and RL Fine-Tuning

SpeedAug is a new reinforcement learning framework that accelerates robotic policy execution by learning optimal task speeds rather than relying on conservative demonstration data. The method combines tempo-enriched policy learning with RL fine-tuning to achieve 1.8x faster real-world task throughput while maintaining success rates.

AINeutralarXiv – CS AI · May 116/10
🧠

TAVIS: A Benchmark for Egocentric Active Vision and Anticipatory Gaze in Imitation Learning

Researchers introduced TAVIS, a comprehensive benchmark for evaluating active vision in imitation learning systems where robotic policies control their own gaze during manipulation tasks. The benchmark includes evaluation protocols, a novel metric (GALT) measuring anticipatory gaze, and baseline experiments showing that active vision benefits are task-dependent rather than universally beneficial.

🏢 Hugging Face