AIBullisharXiv – CS AI · 18h ago7/10
🧠
GEAR-VLA: Learning Geometry-Aware Action Representations for Generalizable Robotic Manipulation
Researchers introduce GEAR-VLA, a Vision-Language-Action framework that improves robotic manipulation by learning geometry-aware representations that generalize across unseen objects, backgrounds, and different robot embodiments. The system demonstrates state-of-the-art performance on multiple benchmarks and achieves 90.1% success on a universal grasping benchmark with 212 previously unseen objects.