arXiv:2205.06160 Abstract | arXiv Analytics

arXiv:2205.06160 [cs.CV]Abstract References Reviews Resources

Localized Vision-Language Matching for Open-vocabulary Object Detection

Maria A. Bravo, Sudhanshu Mittal, Thomas Brox

Published 2022-05-12Version 1

In this work, we propose an open-world object detection method that, based on image-caption pairs, learns to detect novel object classes along with a given set of known classes. It is a two-stage training approach that first uses a location-guided image-caption matching technique to learn class labels for both novel and known classes in a weakly-supervised manner and second specializes the model for the object detection task using known class annotations. We show that a simple language model fits better than a large contextualized language model for detecting novel objects. Moreover, we introduce a consistency-regularization technique to better exploit image-caption pair information. Our method compares favorably to existing open-world detection approaches while being data-efficient.

Categories: cs.CV, cs.LG

Keywords: open-vocabulary object detection, localized vision-language matching, better exploit image-caption pair information, simple language model fits better, detect novel object classes

Related articles: Most relevant | Search more

arXiv:2306.05493 [cs.CV] (Published 2023-06-08)

Multi-Modal Classifiers for Open-Vocabulary Object Detection

Prannay Kaul, Weidi Xie, Andrew Zisserman

arXiv:2011.10678 [cs.CV] (Published 2020-11-20)

Open-Vocabulary Object Detection Using Captions

Alireza Zareian, Kevin Dela Rosa, Derek Hao Hu, Shih-Fu Chang

arXiv:2310.17109 [cs.CV] (Published 2023-10-26)

LP-OVOD: Open-Vocabulary Object Detection by Linear Probing

Chau Pham, Truong Vu, Khoi Nguyen