首页> 外文期刊>Information Processing & Management >A semantic approach to extractive multi-document summarization: Applying sentence expansion for tuning of conceptual densities
【24h】

A semantic approach to extractive multi-document summarization: Applying sentence expansion for tuning of conceptual densities

机译:提取多文件摘要的语义方法:应用句子扩张调整概念密度

获取原文
获取原文并翻译 | 示例
       

摘要

Today, due to a vast amount of textual data, automated extractive text summarization is one of the most common and practical techniques for organizing information. Extractive summarization selects the most appropriate sentences from the text and provide a representative summary. The sentences, as individual textual units, usually are too short for major text processing techniques to provide appropriate performance. Hence, it seems vital to bridge the gap between short text units and conventional text processing methods. In this study, we propose a semantic method for implementing an extractive multi-document summarizer system by using a combination of statistical, machine learning based, and graph-based methods. It is a language-independent and unsupervised system. The proposed framework learns the semantic representation of words from a set of given documents via word2vec method. It expands each sentence through an innovative method with the most informative and the least redundant words related to the main topic of sentence. Sentence expansion implicitly performs word sense disambiguation and tunes the conceptual densities towards the central topic of each sentence. Then, it estimates the importance of sentences by using the graph representation of the documents. To identify the most important topics of the documents, we propose an inventive clustering approach. It autonomously determines the number of clusters and their initial cen-troids, and clusters sentences accordingly. The system selects the best sentences from appropriate clusters for the final summary with respect to information salience, minimum redundancy, and adequate coverage. A set of extensive experiments on DUC2002 and DUC2006 datasets was conducted for investigating the proposed scheme. Experimental results showed that the proposed sentence expansion algorithm and clustering approach could considerably enhance the performance of the summarization system. Also, comparative experiments demonstrated that the proposed framework outperforms most of the state-of-the-art summarizer systems and can impressively assist the task of extractive text summarization.
机译:今天,由于大量文本数据,自动化的提取文本摘要是组织信息的最常见和实用的技术之一。提取摘要选择文本中最合适的句子并提供代表性摘要。作为个别文本单位的句子通常太短,对于主要文本处理技术来提供适当的性能。因此,弥合短文本单位和传统文本处理方法之间的差距似乎至关重要。在这项研究中,我们提出了一种通过使用基于统计,机器学习和基于图的方法的组合来实现提取多文件摘要器系统的语义方法。它是一种独立于语言和无人监督的系统。所提出的框架通过Word2Vec方法从一组给定文档中了解单词的语义表示。它通过创新方法扩展每个句子,具有与句子主题相关的最具信息丰富的方法和最少的单词。句子扩展隐含地执行词感歧义,并将概念密度调整为每个句子的中心主题。然后,它估计句子的重要性通过使用文档的图表表示。要确定文件最重要的主题,我们提出了一种创造性聚类方法。它相应地自主地确定群集数及其初始CEN-TROID和群集句子。该系统根据信息Parience,最小冗余和充分覆盖选择最终摘要的最佳句子。对DUC2002和DUC2006数据集进行了一系列广泛的实验,以调查提出的计划。实验结果表明,所提出的句子扩展算法和聚类方法可以大大提高摘要系统的性能。此外,比较实验表明,拟议的框架优于大多数最先进的摘要系统,并且可以令人印象深刻地协助提取文本摘要的任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号