arXiv Analytics

Sign in

arXiv:1707.01408 [cs.CV]AbstractReferencesReviewsResources

Video Representation Learning and Latent Concept Mining for Large-scale Multi-label Video Classification

Po-Yao Huang, Ye Yuan, Zhenzhong Lan, Lu Jiang, Alexander G. Hauptmann

Published 2017-07-05Version 1

We report on CMU Informedia Lab's system used in Google's YouTube 8 Million Video Understanding Challenge. Our pipeline achieved 84.675% and 84.662% GAP on our evaluation split and the official test set. We attribute the good performance to three components: 1) Refined video representation learning with residual links and hypercolumns 2) Latent concept mining which captures interactions among concepts. 3) Learning with temporal segmentation and weighted multi-model ensemble. We conduct experiments to validate and analyze the contribution of our models. We also share some unsuccessful trials when leveraging conventional approaches such as recurrent neural networks over large-scale video dataset. All the codes to reproduce the results will be publicly available soon.

Related articles: Most relevant | Search more
arXiv:1710.01559 [cs.CV] (Published 2017-10-04)
Monitoring tool usage in cataract surgery videos using boosted convolutional and recurrent neural networks
arXiv:1312.4569 [cs.CV] (Published 2013-11-05, updated 2014-03-10)
Dropout improves Recurrent Neural Networks for Handwriting Recognition
arXiv:1509.05016 [cs.CV] (Published 2015-09-16)
Recurrent Neural Networks for Driver Activity Anticipation via Sensory-Fusion Architecture