【24h】

Subtopic-based Multi-documents Summarization

机译:基于子主题的多文档摘要

获取原文

摘要

Multi-documents summarization is an important research area of NLP. Most methods or techniques of multidocument summarization either consider the documents collection as single-topic or treat every sentence as single-topic only, but lack of a systematic analysis of the subtopic semantics hiding inside the documents. This paper presents a Subtopic-based Multi-documents Summarization (SubTMS) method. It adopts probabilistic topic model to discover the subtopic information inside every sentence and uses a suitable hierarchical subtopic structure to describe both the whole documents collection and all sentences in it. With the sentences represented as subtopic vectors, it assesses the semantic distances of sentences from the documents collectionȁ9;s main subtopics and chooses sentences which have short distance as the final summary of the documents collection. In the experiments on DUC 2007 dataset, we have found that: when training a topicȁ9;s documents collection with some other topicsȁ9; documents collections as background knowledge, our approach can achieve fairly better ROUGE scores compared to other peer systems.
机译:多文档摘要是NLP的重要研究领域。多文档摘要的大多数方法或技术要么将文档集合视为单个主题,要么将每个句子仅视为单个主题,但缺乏对隐藏在文档内部的子主题语义的系统分析。本文提出了一种基于主题的多文档摘要(SubTMS)方法。它采用概率主题模型来发现每个句子中的子主题信息,并使用合适的分层子主题结构来描述整个文档集合和其中的所有句子。以句子作为子主题向量,它评估了文档集合中主要子主题中句子的语义距离;并选择了距离较近的句子作为文档集合的最终摘要。在DUC 2007数据集上的实验中,我们发现:训练一个主题ȁ9时,使用其他一些主题ȁ9收集文档;作为背景知识来收集文档,与其他对等系统相比,我们的方法可以获得更好的ROUGE分数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号