首页> 外文期刊>Expert Systems with Application >How far we can go with extractive text summarization? Heuristic methods to obtain near upper bounds
【24h】

How far we can go with extractive text summarization? Heuristic methods to obtain near upper bounds

机译:提取文本摘要可以走多远?获取接近上限的启发式方法

获取原文
获取原文并翻译 | 示例
           

摘要

Extractive text summarization is an effective way to automatically reduce a text to, a summary by selecting a subset of the text. The performance of a summarization system is usually evaluated by comparing with human-constructed extractive summaries that are created in annotated text datasets. However, for datasets where an abstract is written for reader purpose, the performance of a summarization system is evaluated by comparing with an abstract that is created by human who uses his own words. This makes it difficult to determine how far the state-of-the-art extractive methods are away from the upper bound that an ideal extractive method might achieve. In addition, the performance of an extractive method is always different in each domain, which make it difficult to benchmark. Previous studies construct an ideal sentence-based extract of a document that provides the best score of a given metric by exhaustive search of all possible sentence combinations of a given length. They then use the performance of the extract as the sentence-based upper-bound. However, this only applies to short texts. For long texts and multiple documents, previous studies rely on manual effort, which is expensive and time consuming. In this paper, we propose nine fast heuristic methods to generate the near ideal sentence-based extracts for long texts and multiple documents. Furthermore, we propose an n-gram construction method to construct the word-based upper-bound. A percentage ranking method is used to benchmark different extractive methods across different corpora. In the experiments, five different corpora are used. The results show that the near upper bounds constructed by the proposed methods are close to that using exhaustive search, but the proposed methods are much faster. Six general extractive summarization methods were also assessed to demonstrate the difference between the performance of the methods and the near upper bounds. (C) 2017 Elsevier Ltd. All rights reserved.
机译:提取文本摘要是通过选择文本的子集将文本自动缩减为摘要的有效方法。通常,通过与带注释的文本数据集中创建的人为构造的提取摘要进行比较,来评估摘要系统的性能。但是,对于出于读者目的而编写摘要的数据集,通过与使用自己的单词的人创建的摘要进行比较来评估摘要系统的性能。这使得很难确定最新的提取方法离理想的提取方法可能达到的上限有多远。此外,提取方法的性能在每个域中始终是不同的,这使得很难进行基准测试。以前的研究构建了一个理想的基于句子的文档摘录,该文档通过穷举搜索给定长度的所有可能句子组合来提供给定度量的最佳分数。然后,他们将提取的性能用作基于句子的上限。但是,这仅适用于短文本。对于较长的文本和多个文档,以前的研究依赖于人工工作,这既昂贵又耗时。在本文中,我们提出了九种快速启发式方法来为长文本和多个文档生成接近理想的基于句子的提取。此外,我们提出了一种n-gram构造方法来构造基于单词的上限。百分比排序方法用于对不同语料库中的不同提取方法进行基准测试。在实验中,使用了五个不同的语料库。结果表明,所提出的方法构造的近上限接近于穷举搜索,但是所提出的方法要快得多。还评估了六种通用的提取摘要方法,以证明该方法的性能与接近上限之间的差异。 (C)2017 Elsevier Ltd.保留所有权利。

著录项

  • 来源
    《Expert Systems with Application》 |2017年第30期|439-463|共25页
  • 作者单位

    Guangdong Univ Technol, Sch Electromech Engn, Guangdong Prov Key Lab Comp Integrated Mfg Syst, Guangzhou 510006, Guangdong, Peoples R China;

    Guangdong Univ Technol, Sch Electromech Engn, Guangdong Prov Key Lab Comp Integrated Mfg Syst, Guangzhou 510006, Guangdong, Peoples R China;

    Guangdong Univ Technol, Sch Electromech Engn, Guangdong Prov Key Lab Comp Integrated Mfg Syst, Guangzhou 510006, Guangdong, Peoples R China;

    Hong Kong Polytech Univ, Dept Ind & Syst Engn, Knowledge Management & Innovat Res Ctr, Hong Hom 999077, Hong Kong, Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Extractive text summarization; Upper bound construction; Ideal extracts construction; Summarization evaluation;

    机译:提取文本摘要;上限构造;理想的提取构造;摘要评估;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号