LIBERO-Occ: Evaluating and Improving Vision-Language-Action Models under Scene-Induced Occlusion via Viewpoint Imagination
Researchers introduce LIBERO-Occ, a benchmark for evaluating Vision-Language-Action (VLA) models under object occlusion in robotic manipulation tasks. They propose Viewpoint Imagination (VIM), a technique that generates synthetic alternative viewpoints to improve model robustness when task-relevant objects are partially hidden, achieving performance gains without requiring additional cameras.
Vision-Language-Action models represent a frontier in embodied AI, combining visual perception with language understanding to control robotic systems. However, current VLA evaluations rely on unrealistic assumptions where all relevant objects remain fully visible—a condition rarely met in real-world manipulation environments. This research identifies scene-induced occlusion as a critical failure mode that causes substantial performance degradation in state-of-the-art models, exposing a significant gap between benchmark performance and practical deployment requirements.
The introduction of LIBERO-Occ extends existing robotic manipulation benchmarks with systematic occlusion scenarios, providing researchers with the infrastructure needed to develop more robust systems. This addresses a fundamental challenge in embodied AI: the transition from controlled laboratory settings to unpredictable real-world conditions where partial observability is inevitable. The benchmark's design considers multiple occlusion types and severity levels, offering nuanced insights into model failure modes.
Viewpoint Imagination represents an elegant solution leveraging generative capabilities within existing VLA architectures. Rather than requiring hardware modifications or additional sensor infrastructure at deployment, VIM synthesizes complementary perspectives computationally, effectively performing perception completion through learned imagination. This approach demonstrates how multimodal models can overcome observability constraints through internal reasoning rather than external infrastructure.
For the robotics and embodied AI community, this work establishes occlusion robustness as a measurable, improvable objective. Organizations developing VLA systems for real-world applications—particularly in warehousing, manufacturing, and home automation—gain both diagnostic tools and a proven mitigation strategy. The publicly released benchmark and code accelerate adoption of occlusion-aware training methodologies across the field.
- →VLA models experience significant performance degradation under object occlusion, revealing a critical gap between benchmark and real-world conditions
- →Viewpoint Imagination generates synthetic alternative viewpoints to improve manipulation robustness without additional deployment-time hardware
- →LIBERO-Occ benchmark systematically evaluates occlusion across multiple types and severity levels, enabling standardized robustness assessment
- →VIM improves performance across diverse task suites and occlusion scenarios, suggesting generative perception completion as a scalable approach
- →Open-source release accelerates adoption of occlusion-aware methods in embodied AI systems for practical robotic applications