arXiv:1708.07690 Abstract | arXiv Analytics

arXiv:1708.07690 [cs.CL]Abstract References Reviews Resources

Revisiting the Centroid-based Method: A Strong Baseline for Multi-Document Summarization

Published 2017-08-25Version 1

The centroid-based model for extractive document summarization is a simple and fast baseline that ranks sentences based on their similarity to a centroid vector. In this paper, we apply this ranking to possible summaries instead of sentences and use a simple greedy algorithm to find the best summary. Furthermore, we show possi- bilities to scale up to larger input docu- ment collections by selecting a small num- ber of sentences from each document prior to constructing the summary. Experiments were done on the DUC2004 dataset for multi-document summarization. We ob- serve a higher performance over the orig- inal model, on par with more complex state-of-the-art methods.

Comments: EMNLP 2017 Workshop on New Frontiers in Summarization

Categories: cs.CL

Keywords: multi-document summarization, strong baseline, centroid-based method, complex state-of-the-art methods, simple greedy algorithm

Related articles: Most relevant | Search more

arXiv:1805.04579 [cs.CL] (Published 2018-05-11)

Using Stastical and Semantic Models for Multi-Document Summarization

Divyanshu Daiya, Anukarsh Singh, Mukesh Jadon

arXiv:1905.13164 [cs.CL] (Published 2019-05-30)

Hierarchical Transformers for Multi-Document Summarization

Yang Liu, Mirella Lapata

arXiv:1910.11411 [cs.CL] (Published 2019-10-24)

Multi-Document Summarization with Determinantal Point Processes and Contextualized Representations