arXiv:1803.06316 [cs.CV]AbstractReferencesReviewsResources
Activity Detection with Latent Sub-event Hierarchy Learning
AJ Piergiovanni, Michael S. Ryoo
Published 2018-03-16Version 1
In this paper, we introduce a new convolutional layer named the Temporal Gaussian Mixture (TGM) layer and present how it can be used to efficiently capture temporal structure in continuous activity videos. Our layer is designed to allow the model to learn a latent hierarchy of sub-event intervals. Our approach is fully differentiable while relying on a significantly less number of parameters, enabling its end-to-end training with standard backpropagation. We present our convolutional video models with multiple TGM layers for activity detection. Our experiments on multiple datasets including Charades and MultiTHUMOS confirm the benefit of our TGM layers, illustrating that it outperforms other models and temporal convolutions.