arXiv:1712.06317 Abstract | arXiv Analytics

arXiv:1712.06317 [cs.CV]Abstract References Reviews Resources

Spatial-Temporal Memory Networks for Video Object Detection

Published 2017-12-18Version 1

We introduce Spatial-Temporal Memory Networks (STMN) for video object detection. At its core, we propose a novel Spatial-Temporal Memory module (STMM) as the recurrent computation unit to model long-term temporal appearance and motion dynamics. The STMM's design enables the integration of ImageNet pre-trained backbone CNN weights for both the feature stack as well as the prediction head, which we find to be critical for accurate detection. Furthermore, in order to tackle object motion in videos, we propose a novel MatchTrans module to align the spatial-temporal memory from frame to frame. We compare our method to state-of-the-art detectors on ImageNet VID, and conduct ablative studies to dissect the contribution of our different design choices. We obtain state-of-the-art results with the VGG backbone, and competitive results with the ResNet backbone. To our knowledge, this is the first video object detector that is equipped with an explicit memory mechanism to model long-term temporal dynamics.

Categories: cs.CV

Keywords: video object detection, spatial-temporal memory networks, imagenet pre-trained backbone cnn weights, novel spatial-temporal memory module, model long-term temporal appearance

Related articles: Most relevant | Search more

arXiv:2009.09660 [cs.CV] (Published 2020-09-21)

Feature Flow: In-network Feature Flow Estimation for Video Object Detection

Ruibing Jin, Guosheng Lin, Changyun Wen, Jianliang Wang, Fayao Liu

arXiv:1602.08465 [cs.CV] (Published 2016-02-26)

Seq-NMS for Video Object Detection

Wei Han et al.

arXiv:1712.05896 [cs.CV] (Published 2017-12-16)

Impression Network for Video Object Detection

Congrui Hetang, Hongwei Qin, Shaohui Liu, Junjie Yan

arXiv Analytics

arXiv:1712.06317 [cs.CV]Abstract References Reviews Resources

Spatial-Temporal Memory Networks for Video Object Detection

Links

Toolbox

arXiv:1712.06317 [cs.CV]AbstractReferencesReviewsResources

Spatial-Temporal Memory Networks for Video Object Detection

Links

Toolbox

arXiv:1712.06317 [cs.CV]Abstract References Reviews Resources