首页> 外文期刊>Expert Systems with Application >Query-oriented unsupervised multi-document summarization via deep learning model
【24h】

Query-oriented unsupervised multi-document summarization via deep learning model

机译:深度学习模型的面向查询的无监督多文档摘要

获取原文
获取原文并翻译 | 示例
           

摘要

Capturing the compositional process from words to documents is a key challenge in natural language processing and information retrieval: Extractive style query-oriented multi-document summarization generates a summary by extracting a proper set of sentences from multiple documents based on pre-given query. This paper proposes a novel document summarization framework based on deep learning model, which has been shown outstanding extraction ability in many real-world applications. The framework consists of three parts: concepts extraction, summary generation, and reconstruction validation. A new query-oriented extraction technique is proposed to extract information distributed in multiple documents. Then, the whole deep architecture is fine-tuned by minimizing the information loss in reconstruction validation. According to the concepts extracted from deep architecture layer by layer, dynamic programming is used to seek most informative set of sentences for the summary. Experiment on three benchmark datasets (DUC 2005, 2006, and 2007) assess and confirm the effectiveness of the proposed framework and algorithms. Experiment results show that the proposed method outperforms state-of-the-art extractive summarization approaches. Moreover, we also provide the statistical analysis of query words based on Amazon's Mechanical Turk (MTurk) crowdsourcing platform. There exists underlying relationships from topic words to the content which can contribute to summarization task. (C) 2015 Elsevier Ltd. All rights reserved.
机译:从单词到文档的捕获过程是自然语言处理和信息检索中的关键挑战:面向提取风格查询的多文档摘要可根据预先提供的查询从多个文档中提取适当的句子集来生成摘要。本文提出了一种基于深度学习模型的新颖的文档摘要框架,该框架在许多实际应用中均显示出出色的提取能力。该框架包括三个部分:概念提取,摘要生成和重构验证。提出了一种新的面向查询的提取技术来提取分布在多个文档中的信息。然后,通过最小化重建验证中的信息丢失来对整个深度架构进行微调。根据从深度架构中逐层提取的概念,动态编程可用于寻求最有信息量的句子集合。在三个基准数据集(DUC 2005、2006和2007)上进行的实验评估并确认了所提出的框架和算法的有效性。实验结果表明,所提方法优于最新的提取总结方法。此外,我们还提供基于Amazon的Mechanical Turk(MTurk)众包平台的查询词的统计分析。存在从主题词到内容的潜在关系,这些关系可能有助于摘要任务。 (C)2015 Elsevier Ltd.保留所有权利。

著录项

  • 来源
    《Expert Systems with Application》 |2015年第21期|8146-8155|共10页
  • 作者单位

    Shen Zhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Guangdong, Peoples R China|Hong Kong Polytech Univ, Dept Comp, Kowloon 999077, Hong Kong, Peoples R China;

    Hong Kong Polytech Univ, Dept Comp, Kowloon 999077, Hong Kong, Peoples R China;

    City Univ Hong Kong, Dept Linguist & Translat, Kowloon 999077, Hong Kong, Peoples R China;

    Nanjing Univ, Sch Business, Nanjing 210093, Jiangsu, Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Deep learning; Query-oriented summarization; Multi-document; Neocortex simulation;

    机译:深度学习;面向查询的摘要;多文档;新皮质仿真;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号