【24h】

Subtopic-based Multi-documents Summarization

机译:基于子主题的多文档摘要

获取原文

摘要

Multi-documents summarization is an important research area of NLP. Most methods or techniques of multi-document summarization either consider the documents collection as single-topic or treat every sentence as single-topic only, but lack of a systematic analysis of the subtopic semantics hiding inside the documents. This paper presents a Subtopicbased Multi-documents Summarization (SubTMS) method. It adopts probabilistic topic model to discover the subtopic information inside every sentence and uses a suitable hierarchical subtopic structure to describe both the whole documents collection and all sentences in it With the sentences represented as subtopicvectors, it assesses the semantic distances of sentences from the documents collection's main subtopics and chooses sentences which have short distance as the final summary of the documents collection. In the experiments on DUC 2007 dataset, we have found that: when training a topic's documents collection with some other topics' documents collections as background knowledge, our approach can achieve fairly better ROUGE scores compared to other peer systems.
机译:多文档摘要是NLP的重要研究领域。多文档摘要的大多数方法或技术要么将文档集合视为单个主题,要么将每个句子仅视为单个主题,但是缺乏对隐藏在文档内部的子主题语义的系统分析。本文提出了一种基于主题的多文档摘要(SubTMS)方法。它采用概率主题模型来发现每个句子中的子主题信息,并使用合适的分层子主题结构来描述整个文档集合和其中的所有句子,并将这些句子表示为子主题向量,以评估句子与文档集合的语义距离。主要副主题,并选择距离较近的句子作为文档收集的最终摘要。在DUC 2007数据集上进行的实验中,我们发现:与其他对等系统相比,当以某个其他主题的文档集合作为背景知识来训练某个主题的文档集合时,我们的方法可以获得较高的ROUGE分数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号