Enhancing Diversity, Coverage and Balance for Summarization through Structure Learning

Presented at: 18th International World Wide Web Conference (WWW2009)

by Liangda Li, Ke Zhou, Gui-Rong Xue, Hongyuan Zha, Yong Yu

Webpage: http://www2009.eprints.org/8/1/p71.pdf

Document summarization plays an increasingly important role with the exponential growth of documents on the Web. Many supervised and unsupervised approaches have been proposed to generate summaries from documents. However, these approaches seldom simultaneously consider summary diversity, coverage, and balance issues which to a large extent determine the quality of summaries. In this paper, we consider extract-based summarization emphasizing the following three requirements: 1) diversity in summarization, which seeks to reduce redundancy among sentences in the summary; 2) sufficient coverage, which focuses on avoiding the loss of the document's main information when generating the summary; and 3) balance, which demands that different aspects of the document need to have about the same relative importance in the summary. We formulate the extract-based summarization problem as learning a mapping from a set of sentences of a given document to a subset of the sentences that satisfies the above three requirements. The mapping is learned by incorporating several constraints in a structure learning framework, and we explore the graph structure of the output variables and employ structural SVM for solving the resulted optimization problem. Experiments on the DUC2001 data sets demonstrate significant performance improvements in terms of F1 and ROUGE metrics.

Keywords: Data Mining

Resource URI on the dog food server: http://data.semanticweb.org/conference/www/2009/paper/8

Explore this resource elsewhere: