From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects
Researchers have developed a framework that enables open vocabulary object detection models to operate in real-world settings by identifying and learning previously unseen objects. The method introduces techniques called Open World Embedding Learning (OWEL) and Multi-Scale Contrastive Anchor Learning (MSCAL) to detect unknown objects and reduce misclassification errors.