arXiv:1406.5824 Abstract | arXiv Analytics

arXiv:1406.5824 [cs.CV]Abstract References Reviews Resources

VideoSET: Video Summary Evaluation through Text

Published 2014-06-23Version 1

In this paper we present VideoSET, a method for Video Summary Evaluation through Text that can evaluate how well a video summary is able to retain the semantic information contained in its original video. We observe that semantics is most easily expressed in words, and develop a text-based approach for the evaluation. Given a video summary, a text representation of the video summary is first generated, and an NLP-based metric is then used to measure its semantic distance to ground-truth text summaries written by humans. We show that our technique has higher agreement with human judgment than pixel-based distance metrics. We also release text annotations and ground-truth text summaries for a number of publicly available video datasets, for use by the computer vision community.

Categories: cs.CV, cs.CL, cs.IR

Keywords: video summary evaluation, ground-truth text summaries written, computer vision community, pixel-based distance metrics, original video

Related articles: Most relevant | Search more

arXiv:1310.2053 [cs.CV] (Published 2013-10-08)

The role of RGB-D benchmark datasets: an overview

Kai Berger

arXiv:1705.04402 [cs.CV] (Published 2017-05-11)

Negative Results in Computer Vision: A Perspective

Ali Borji

arXiv:2312.04563 [cs.CV] (Published 2023-12-07)

Visual Geometry Grounded Deep Structure From Motion

Jianyuan Wang, Nikita Karaev, Christian Rupprecht, David Novotny

arXiv Analytics

arXiv:1406.5824 [cs.CV]Abstract References Reviews Resources

VideoSET: Video Summary Evaluation through Text

Links

Toolbox

arXiv:1406.5824 [cs.CV]AbstractReferencesReviewsResources

VideoSET: Video Summary Evaluation through Text

Links

Toolbox

arXiv:1406.5824 [cs.CV]Abstract References Reviews Resources