【24h】

On redundancy in multi-document summarization

机译:关于多文件摘要的冗余

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

In this paper we study how the presence or absence of redundancy on multiple related texts can be used to compute sentence relevance for extractive multi-document summarization. Two types of redundancy can be found: intra-document and inter-document. By experimenting with them, different ideas can be extracted, for example: statements redundant between documents-which can be important by their popularity; statements that are not redundant-which can be important by their novelty; or statements redundant within each document-which can be important by being constantly addressed by a single author. We propose an unsupervised graph-based method that allows to generate summaries based on different strategies of redundancy. We present experiments on two DUC corpora of nine different strategies to extract information depending of how redundancy within a document and in different documents is managed. According to DUC gold standards, we found that a multi-document generic summary should contain the most redundant (popular) information between different sources while avoiding local intra-document redundancy. We implemented a mechanism to enrich sentence rankings with redundancy, improving the evaluation of summaries.
机译:在本文中,我们研究了多个相关文本上的冗余的存在与否,可用于计算用于提取多文件摘要的句子相关性。可以找到两种类型的冗余:文档内部和文档帧。通过对它们进行实验,可以提取不同的想法,例如:文档之间的陈述 - 这可能是他们普及的重要性;不是冗余的陈述 - 他们的新颖性可能是重要的;或每个文档中冗余的陈述 - 通过单个作者不断解决,这可能是重要的。我们提出了一种无监督的基于图形的方法,允许基于冗余的不同策略生成摘要。我们对九个不同策略的DUC Corea进行实验,以提取信息,这取决于文档中的冗余以及不同文件的冗余。根据DUC Gold标准,我们发现多文件通用摘要应在不同来源之间包含最冗余的(流行的)信息,同时避免文档内部冗余。我们实施了冗余的纪念排名的机制,提高了对摘要的评估。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号