PANDO: Efficient Multimodal AI Agents via Online Skill Distillation
PANDO introduces an efficient multimodal AI agent framework that improves performance while reducing computational costs through online skill distillation, achieving 58.3% success on VisualWebArena tasks with 58-61% fewer tokens than competing approaches. The system addresses inefficiencies in web agent design by maintaining a skill library and employing hierarchical routing, visual compression, and cache-aware prompting without requiring expensive pre-evaluation.