y0news
← Feed
←Back to feed
🧠 AI🟒 BullishImportance 6/10

CLASP: Language-Driven Robot Skill Selection and Composition using Task-Parameterized Learning

arXiv – CS AI|Markus Knauer, Valentin Gieraths, Tai Mai, Samuel Bustamante, Alin Albu-Sch\"affer, Freek Stulp, Jo\~ao Silv\'erio|
πŸ€–AI Summary

CLASP is a modular robotic system that combines task-parameterized learning with vision-language models to enable robots to understand natural language commands while maintaining data efficiency. The approach achieves 73-100% success rates on manipulation tasks by learning skills from minimal demonstrations and composing them dynamically without fine-tuning the underlying models.

Analysis

CLASP addresses a fundamental challenge in robotics: bridging the gap between data-efficient skill learning and intuitive natural language interaction. Traditional foundation models like VLMs and VLAs provide natural language grounding but demand extensive training data, while task-parameterized imitation learning achieves efficiency through minimal demonstrations but lacks language understanding. This work demonstrates that neither approach alone is sufficient; instead, a hybrid architecture leveraging pretrained models' language capabilities alongside efficient learning mechanisms creates a more practical system.

The technical contribution centers on combining task-parameterized kernelized movement primitives (TP-KMPs) with pretrained vision-language models. During the learning phase, robots acquire skills from just 2-5 kinesthetic demonstrations, with the VLM automatically generating schema descriptions of parameters and preconditions. Crucially, this process requires no fine-tuning of the foundation model, preserving its general knowledge while specializing its application.

The execution pipeline showcases practical reasoning: the VLM interprets natural language commands to select appropriate skills, bind parameters to task specifics, and compose multiple skills into novel behaviors through covariance weighting. When capability gaps emerge, the system identifies exactly which demonstrations are needed, enabling targeted active learning rather than generic data collection.

For the robotics industry, CLASP represents progress toward more accessible robot programming. The 73-100% success rates across skill selection, composition, and active learning scenarios suggest practical viability. The approach reduces the typical trade-off between sample efficiency and interpretability, enabling robots to work with limited data while responding to natural human commands, a critical requirement for real-world deployment in varied environments.

Key Takeaways
  • β†’CLASP combines task-parameterized learning with pretrained VLMs to achieve both data efficiency and natural language grounding without fine-tuning.
  • β†’The system learns manipulation skills from just 2-5 kinesthetic demonstrations, generating skill schemas automatically via vision-language models.
  • β†’Novel task behaviors are created through covariance-weighted composition of existing skills, expanding capability beyond learned primitives.
  • β†’Active learning identifies capability gaps and requests targeted demonstrations, optimizing data collection efficiency.
  • β†’Validation on a 7-DoF manipulator demonstrates 73-100% success rates across skill selection, composition, and continuous learning scenarios.
Read Original β†’via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains β€” you keep full control of your keys.
Connect Wallet to AI β†’How it works
Related Articles