首页> 外文期刊>Information Processing & Management >Text summarization using topic-based vector space model and semantic measure
【24h】

Text summarization using topic-based vector space model and semantic measure

机译:基于主题的向量空间模型和语义测量的文本摘要

获取原文
获取原文并翻译 | 示例
           

摘要

The primary shortcoming associated with extractive text summarization is redundancy, where more than one sentence representing a similar type of information are incorporated in summary. In the last two decades, a lot of extractive text summarization methods have been proposed, but less attention was paid to the redundancy issue. In this paper, we propose a text summarization technique that incorporates topic modeling and semantic measure within the vector space model to find the extractive summary of the given text. Our main objective is to address the redundancy problem associated with summarization methods and include only those sentences in summary, which represent the maximum of the topics embedded in the given text document. We generate the topic vector of the given document by representing the sentences in an intermediate form using a vector space model and topic modeling. Moreover, to make the proposed method efficient, we incorporate the semantic similarity measure to find the relevance of the sentence. We introduce two different ways to create the topic vector from the given document, i.e., Combined topic vector and Individual topic vector approach. Evaluation results on two datasets show that the summaries generated by both variants (Combined and Individual topic vector techniques) of the proposed method are found to be closer to the human-generated summaries when compared with the existing text summarization methods.
机译:与提取文本摘要相关的主要缺点是冗余,其中包括表示类似类型信息类型的句子。在过去的二十年中,提出了许多提取文本摘要方法,但对冗余问题的关注较少。在本文中,我们提出了一种文本摘要技术,该技术在矢量空间模型中融入了主题建模和语义测量,以找到给定文本的提取摘要。我们的主要目标是解决与摘要方法相关的冗余问题,并仅包括摘要中的句子,它代表了给定文本文档中嵌入的主题的最大值。我们通过使用矢量空间模型和主题建模代表中间形式的句子来生成给定文档的主题向量。此外,为了使提出的方法有效,我们纳入了语义相似度量,以找到句子的相关性。我们介绍了两种不同的方法来从给定的文档,即组合主题向量和单个主题向量方法创建主题向量。两个数据集的评估结果表明,与现有文本摘要方法相比,发现由所提出的方法的变体(组合和各个主题矢量技术)产生的概述是更接近人生成的摘要。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号