首页> 外文会议>International conference on distributed computing and internet technologies >A New Automatic Multi-document Text Summarization using Topic Modeling
【24h】

A New Automatic Multi-document Text Summarization using Topic Modeling

机译:使用主题建模的新的自动多文档文本摘要

获取原文

摘要

This paper proposes a novel methodology to generate an extractive text summary from a corpus of documents. Unlike most existing methods, our approach is designed in such a way that the final generated summary covers all the important topics from a corpus of documents. We propose a heuristic method which uses the Latent Dirichlet Allocation technique to identify the optimum number of independent topics present in the corpus. Some of the sentences are identified as the important sentences from each independent topic using a set of word and sentence level features. In order to ensure that the final summary is coherent, we suggest a novel technique to reorder the sentences based on sentence similarity. The use of topic modeling ensures that all the important content from the corpus of documents is captured in the extracted summary which in turn strengthen the summary. Experimental results show that the proposed approach is promising.
机译:本文提出了一种新颖的方法来从文档集生成摘要文本摘要。与大多数现有方法不同,我们的方法的设计方式是最终生成的摘要涵盖了文档集中的所有重要主题。我们提出一种启发式方法,该方法使用潜在狄利克雷分配技术来确定语料库中存在的独立主题的最佳数量。使用一组单词和句子级别的功能,可以将某些句子标识为来自每个独立主题的重要句子。为了确保最终摘要的连贯性,我们提出了一种基于句子相似度对句子重新排序的新颖技术。主题建模的使用可确保将文档语料库中的所有重要内容捕获到提取的摘要中,从而增强摘要。实验结果表明,该方法是有前途的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号