首页> 外文会议>International Conference on Soft Computing and Intelligent Systems;International Symposium on Advanced Intelligent Systems >Multi-document Summarization by Creating Synthetic Document Vector Based on Language Model
【24h】

Multi-document Summarization by Creating Synthetic Document Vector Based on Language Model

机译:通过基于语言模型创建综合文档向量的多文档摘要

获取原文

摘要

Multi-document summarization is to create summaries covering the major information that multiple documents tell in common. For this point, the existing methods are based on hand-crafted features for word and sentence. However, it is difficult to figure out the core contents of each document with the hand-crafted features because they have the limited information presented the given documents. Moreover, there exists a limit to figure out the major information because documents with the same meaning used to be paraphrased depending on their writers. Therefore, it is necessary to represent the semantic meanings of documents as well as sentences through understanding natural language. In this paper, we propose a new multi-document summarization system by creating a synthetic document vector covering the whole documents based on Language Model, whose is well-known for learning the semantic features in text. We experimented with DUC 2004 dataset provided by Document Understanding Conference (DUC) and the results show that our method summarizes multiple documents effectively based on their core contents.
机译:多文档摘要是为了创建摘要,以涵盖多个文档共同讲述的主要信息。为此,现有方法基于单词和句子的手工制作功能。但是,由于手工制作的功能在给定文档中提供的信息有限,因此很难找出每个文档的核心内容。此外,由于主要具有相同含义的文档根据其作者而被释义,因此找出主要信息存在一定的局限性。因此,有必要通过理解自然语言来表达文档和句子的语义含义。本文通过基于语言模型创建覆盖整个文档的合成文档向量,提出了一种新的多文档摘要系统,该系统以学习文本的语义特征而闻名。我们对由文档理解会议(DUC)提供的DUC 2004数据集进行了试验,结果表明,我们的方法有效地总结了基于其核心内容的多个文档。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号