AINeutralarXiv – CS AI · 7h ago6/10
🧠
PInVerify: An Offline Embodied Benchmark for Active Instance Verification
Researchers introduce PInVerify, an offline benchmark for training embodied AI agents to verify whether objects match fine-grained descriptions through active viewpoint selection. The benchmark includes 3,000 episodes across 18 object categories and evaluates multimodal language models at on-device scale, with best results reaching 85.6% accuracy using fine-tuned approaches.