首页> 外文期刊>Expert Systems with Application >Extractive multi-document summarization using population-based multicriteria optimization
【24h】

Extractive multi-document summarization using population-based multicriteria optimization

机译:使用基于群体的多准则优化进行提取式多文档摘要

获取原文
获取原文并翻译 | 示例
       

摘要

Multi-document summarization is the process of extracting salient information from a set of source texts and present that information to the user in a condensed form. In this paper, we propose a multi document summarization system which generates an extractive generic summary with maximum relevance and minimum redundancy by representing each sentence of the input document as a vector of words in Proper Noun, Noun, Verb and Adjective set. Five features, such as TF_ISF, Aggregate Cross Sentence Similarity, Title Similarity, Proper Noun and Sentence Length associated with the sentences, are extracted, and scores are assigned to sentences based on these features. Weights that can be assigned to different features may vary depending upon the nature of the document, and it is hard to discover the most appropriate weight for each feature, and this makes generation of a good summary a very tough task without human intelligence. Multi-document summarization problem is having large number of decision parameters and number of possible solutions from which most optimal summary is to be generated. Summary generated may not guarantee the essential quality and may be far from the ideal human generated summary. To address this issue, we propose a population-based multicriteria optimization method with multiple objective functions. Three objective functions are selected to determine an optimal summary, with maximum relevance, diversity, and novelty, from a global population of summaries by considering both the statistical and semantic aspects of the documents. Semantic aspects are considered by Latent Semantic Analysis (LSA) and Non Negative Matrix Factorization (NMF) techniques. Experiments have been performed on DUC 2002, DUC 2004 and DUC 2006 datasets using ROUGE tool kit. Experimental results show that our system outperforms the state of the art works in terms of Recall and Precision. (C) 2017 Elsevier Ltd. All rights reserved.
机译:多文档摘要是从一组源文本中提取显着信息并将其以简明形式呈现给用户的过程。在本文中,我们提出了一种多文档摘要系统,该系统通过将输入文档的每个句子表示为专有名词,名词,动词和形容词集中的单词向量来生成具有最大相关性和最小冗余的提取性摘要。提取与句子相关的五个特征(例如TF_ISF,聚合交叉句子相似度,标题相似度,专有名词和句子长度),并根据这些特征将分数分配给句子。可以分配给不同功能的权重可能因文档的性质而异,并且很难为每个功能找到最合适的权重,这使得生成良好的摘要成为一项没有人工智慧的艰巨任务。多文档摘要问题具有大量的决策参数和可能的解决方案,从中可以生成最优化的摘要。生成的摘要可能无法保证基本质量,并且可能与理想的人工生成的摘要相去甚远。为了解决这个问题,我们提出了一种基于人口的具有多个目标函数的多准则优化方法。通过考虑文档的统计和语义方面,从全局的摘要中选择了三个目标函数来确定具有最大相关性,多样性和新颖性的最佳摘要。潜在语义分析(LSA)和非负矩阵分解(NMF)技术考虑了语义方面。使用ROUGE工具套件对DUC 2002,DUC 2004和DUC 2006数据集进行了实验。实验结果表明,我们的系统在查全率和精确度方面都优于最新技术。 (C)2017 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号