arXiv:2009.09660 Abstract | arXiv Analytics

arXiv:2009.09660 [cs.CV]Abstract References Reviews Resources

Feature Flow: In-network Feature Flow Estimation for Video Object Detection

Ruibing Jin, Guosheng Lin, Changyun Wen, Jianliang Wang, Fayao Liu

Published 2020-09-21Version 1

Optical flow, which expresses pixel displacement, is widely used in many computer vision tasks to provide pixel-level motion information. However, with the remarkable progress of the convolutional neural network, recent state-of-the-art approaches are proposed to solve problems directly on feature-level. Since the displacement of feature vector is not consistent to the pixel displacement, a common approach is to:forward optical flow to a neural network and fine-tune this network on the task dataset. With this method,they expect the fine-tuned network to produce tensors encoding feature-level motion information. In this paper, we rethink this de facto paradigm and analyze its drawbacks in the video object detection task. To mitigate these issues, we propose a novel network (IFF-Net) with an \textbf{I}n-network \textbf{F}eature \textbf{F}low estimation module (IFF module) for video object detection. Without resorting pre-training on any additional dataset, our IFF module is able to directly produce \textbf{feature flow} which indicates the feature displacement. Our IFF module consists of a shallow module, which shares the features with the detection branches. This compact design enables our IFF-Net to accurately detect objects, while maintaining a fast inference speed. Furthermore, we propose a transformation residual loss (TRL) based on \textit{self-supervision}, which further improves the performance of our IFF-Net. Our IFF-Net outperforms existing methods and sets a state-of-the-art performance on ImageNet VID.

Categories: cs.CV

Keywords: video object detection, in-network feature flow estimation, tensors encoding feature-level motion, encoding feature-level motion information

Related articles: Most relevant | Search more

arXiv:2009.07498 [cs.CV] (Published 2020-09-16)

Dual Semantic Fusion Network for Video Object Detection

Lijian Lin, Haosheng Chen, Honglun Zhang, Jun Liang, Yu Li, Ying Shan, Hanzi Wang

arXiv:1602.08465 [cs.CV] (Published 2016-02-26)

Seq-NMS for Video Object Detection

Wei Han et al.

arXiv:1712.06317 [cs.CV] (Published 2017-12-18)

Spatial-Temporal Memory Networks for Video Object Detection

Fanyi Xiao, Yong Jae Lee

arXiv Analytics

arXiv:2009.09660 [cs.CV]Abstract References Reviews Resources

Feature Flow: In-network Feature Flow Estimation for Video Object Detection

Links

Toolbox

arXiv:2009.09660 [cs.CV]AbstractReferencesReviewsResources

Feature Flow: In-network Feature Flow Estimation for Video Object Detection

Links

Toolbox

arXiv:2009.09660 [cs.CV]Abstract References Reviews Resources