arXiv Analytics

Sign in

arXiv:1708.07690 [cs.CL]AbstractReferencesReviewsResources

Revisiting the Centroid-based Method: A Strong Baseline for Multi-Document Summarization

Demian Gholipour Ghalandari

Published 2017-08-25Version 1

The centroid-based model for extractive document summarization is a simple and fast baseline that ranks sentences based on their similarity to a centroid vector. In this paper, we apply this ranking to possible summaries instead of sentences and use a simple greedy algorithm to find the best summary. Furthermore, we show possi- bilities to scale up to larger input docu- ment collections by selecting a small num- ber of sentences from each document prior to constructing the summary. Experiments were done on the DUC2004 dataset for multi-document summarization. We ob- serve a higher performance over the orig- inal model, on par with more complex state-of-the-art methods.

Comments: EMNLP 2017 Workshop on New Frontiers in Summarization
Categories: cs.CL
Related articles: Most relevant | Search more
arXiv:1805.04579 [cs.CL] (Published 2018-05-11)
Using Stastical and Semantic Models for Multi-Document Summarization
arXiv:1905.13164 [cs.CL] (Published 2019-05-30)
Hierarchical Transformers for Multi-Document Summarization
arXiv:1910.11411 [cs.CL] (Published 2019-10-24)
Multi-Document Summarization with Determinantal Point Processes and Contextualized Representations