arXiv Analytics

Sign in

arXiv:1702.01528 [cs.CV]AbstractReferencesReviewsResources

Textually Customized Video Summaries

Jinsoo Choi, Tae-Hyun Oh, In So Kweon

Published 2017-02-06Version 1

The best summary of a long video differs among different people due to its highly subjective nature. Even for the same person, the best summary may change with time or mood. In this paper, we introduce the task of generating customized video summaries through simple text. First, we train a deep architecture to effectively learn semantic embeddings of video frames by leveraging the abundance of image-caption data via a progressive and residual manner. Given a user-specific text description, our algorithm is able to select semantically relevant video segments and produce a temporally aligned video summary. In order to evaluate our textually customized video summaries, we conduct experimental comparison with baseline methods that utilize ground-truth information. Despite the challenging baselines, our method still manages to show comparable or even exceeding performance. We also show that our method is able to generate semantically diverse video summaries by only utilizing the learned visual embeddings.

Related articles: Most relevant | Search more
arXiv:1401.3590 [cs.CV] (Published 2014-01-14, updated 2016-04-19)
An Enhanced Method For Evaluating Automatic Video Summaries
arXiv:1609.08758 [cs.CV] (Published 2016-09-28)
Video Summarization using Deep Semantic Features
arXiv:2002.03740 [cs.CV] (Published 2020-01-31)
Convolutional Hierarchical Attention Network for Query-Focused Video Summarization