arXiv:1711.11248 Abstract | arXiv Analytics

arXiv:1711.11248 [cs.CV]Abstract References Reviews Resources

A Closer Look at Spatiotemporal Convolutions for Action Recognition

Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, Manohar Paluri

Published 2017-11-30Version 1

In this paper we discuss several forms of spatiotemporal convolutions for video analysis and study their effects on action recognition. Our motivation stems from the observation that 2D CNNs applied to individual frames of the video have remained solid performers in action recognition. In this work we empirically demonstrate the accuracy advantages of 3D CNNs over 2D CNNs within the framework of residual learning. Furthermore, we show that factorizing the 3D convolutional filters into separate spatial and temporal components yields significantly advantages in accuracy. Our empirical study leads to the design of a new spatiotemporal convolutional block "R(2+1)D" which gives rise to CNNs that achieve results comparable or superior to the state-of-the-art on Sports-1M, Kinetics, UCF101 and HMDB51.

Categories: cs.CV

Keywords: action recognition, spatiotemporal convolutions, closer look, temporal components yields significantly advantages, 2d cnns

Related articles: Most relevant | Search more

arXiv:1607.02556 [cs.CV] (Published 2016-07-09)

Action Recognition with Joint Attention on Multi-Level Deep Features

Jialin Wu, Gu Wang, Wukui Yang, Xiangyang Ji

arXiv:1906.06813 [cs.CV] (Published 2019-06-17)

A Temporal Sequence Learning for Action Recognition and Prediction

Sangwoo Cho, Hassan Foroosh

arXiv:1809.03669 [cs.CV] (Published 2018-09-11)

Temporal-Spatial Mapping for Action Recognition

Xiaolin Song, Cuiling Lan, Wenjun Zeng, Junliang Xing, Jingyu Yang, Xiaoyan Sun