How far we can go with extractive text summarization? Heuristic methods to obtain near upper bounds

Wang W. M.; Li Z.; Wang J. W.; Zheng Z. H.

首页> 外文期刊>Expert Systems with Application >How far we can go with extractive text summarization? Heuristic methods to obtain near upper bounds

【24h】

How far we can go with extractive text summarization? Heuristic methods to obtain near upper bounds

机译：提取文本摘要可以走多远？获取接近上限的启发式方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Extractive text summarization is an effective way to automatically reduce a text to, a summary by selecting a subset of the text. The performance of a summarization system is usually evaluated by comparing with human-constructed extractive summaries that are created in annotated text datasets. However, for datasets where an abstract is written for reader purpose, the performance of a summarization system is evaluated by comparing with an abstract that is created by human who uses his own words. This makes it difficult to determine how far the state-of-the-art extractive methods are away from the upper bound that an ideal extractive method might achieve. In addition, the performance of an extractive method is always different in each domain, which make it difficult to benchmark. Previous studies construct an ideal sentence-based extract of a document that provides the best score of a given metric by exhaustive search of all possible sentence combinations of a given length. They then use the performance of the extract as the sentence-based upper-bound. However, this only applies to short texts. For long texts and multiple documents, previous studies rely on manual effort, which is expensive and time consuming. In this paper, we propose nine fast heuristic methods to generate the near ideal sentence-based extracts for long texts and multiple documents. Furthermore, we propose an n-gram construction method to construct the word-based upper-bound. A percentage ranking method is used to benchmark different extractive methods across different corpora. In the experiments, five different corpora are used. The results show that the near upper bounds constructed by the proposed methods are close to that using exhaustive search, but the proposed methods are much faster. Six general extractive summarization methods were also assessed to demonstrate the difference between the performance of the methods and the near upper bounds. (C) 2017 Elsevier Ltd. All rights reserved.

机译：提取文本摘要是通过选择文本的子集将文本自动缩减为摘要的有效方法。通常，通过与带注释的文本数据集中创建的人为构造的提取摘要进行比较，来评估摘要系统的性能。但是，对于出于读者目的而编写摘要的数据集，通过与使用自己的单词的人创建的摘要进行比较来评估摘要系统的性能。这使得很难确定最新的提取方法离理想的提取方法可能达到的上限有多远。此外，提取方法的性能在每个域中始终是不同的，这使得很难进行基准测试。以前的研究构建了一个理想的基于句子的文档摘录，该文档通过穷举搜索给定长度的所有可能句子组合来提供给定度量的最佳分数。然后，他们将提取的性能用作基于句子的上限。但是，这仅适用于短文本。对于较长的文本和多个文档，以前的研究依赖于人工工作，这既昂贵又耗时。在本文中，我们提出了九种快速启发式方法来为长文本和多个文档生成接近理想的基于句子的提取。此外，我们提出了一种n-gram构造方法来构造基于单词的上限。百分比排序方法用于对不同语料库中的不同提取方法进行基准测试。在实验中，使用了五个不同的语料库。结果表明，所提出的方法构造的近上限接近于穷举搜索，但是所提出的方法要快得多。还评估了六种通用的提取摘要方法，以证明该方法的性能与接近上限之间的差异。（C）2017 Elsevier Ltd.保留所有权利。

著录项

来源
《Expert Systems with Application》 |2017年第30期|439-463|共25页
作者
Wang W. M.; Li Z.; Wang J. W.; Zheng Z. H.;
展开▼
作者单位

Guangdong Univ Technol, Sch Electromech Engn, Guangdong Prov Key Lab Comp Integrated Mfg Syst, Guangzhou 510006, Guangdong, Peoples R China;

Guangdong Univ Technol, Sch Electromech Engn, Guangdong Prov Key Lab Comp Integrated Mfg Syst, Guangzhou 510006, Guangdong, Peoples R China;

Guangdong Univ Technol, Sch Electromech Engn, Guangdong Prov Key Lab Comp Integrated Mfg Syst, Guangzhou 510006, Guangdong, Peoples R China;

Hong Kong Polytech Univ, Dept Ind & Syst Engn, Knowledge Management & Innovat Res Ctr, Hong Hom 999077, Hong Kong, Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Extractive text summarization; Upper bound construction; Ideal extracts construction; Summarization evaluation;

机译：提取文本摘要;上限构造;理想的提取构造;摘要评估;

相似文献

外文文献
中文文献
专利

1. AN ENHANCED EXTRACTIVE TEXT SUMMARIZATION METHOD FOR MULTIPLE DOCUMENTS [J] . ADIBA MAHJABIN NITU, PALASH UDDIN, PRIYANKA BASAK TUMPA, Journal of Theoretical and Applied Information Technology . 2019,第23期

机译：多文档的增强提取文本摘要方法
2. Extractive Summarization Method for Arabic Text - ESMAT [J] . Mohammed Salem Binwahlan International Journal of Computer Trends and Technology . 2015,第2期

机译：阿拉伯文本的提取摘要方法-ESMAT
3. Feature Priority Based Sentence Filtering Method for Extractive Automatic Text Summarization [J] . Yogesh Kumar Meena, Dinesh Gopalani Procedia Computer Science . 2015,第1期

机译：基于特征优先权的句子抽取自动文本摘要过滤方法
4. Calculating the Upper Bounds for Portuguese Automatic Text Summarization Using Genetic Algorithm [C] . Jonathan Rojas-Simon, Yulia Ledeneva, Rene Arnulfo Garcia-Hernandez Ibero-American conference on artificial intelligence . 2018

机译：使用遗传算法计算葡萄牙语自动文本摘要的上限
5. A Hierarchical Extractive Text Summarization Approach [D] . Alshahrani, Saud Shari. 2021

机译：分层提取文本摘要方法
6. Extractive text summarization system to aid data extraction from full text in systematic review development [O] . Duy Duc An Bui, Guilherme Del Fiol, John F. Hurdle, -1

机译：提取文本摘要系统可在系统评价开发中帮助从全文中提取数据
7. Calculating the Upper Bounds for Portuguese Automatic Text Summarization Using Genetic Algorithm [O] . Jonathan Rojas-Simón, Yulia Ledeneva, René Arnulfo García-Hernández 2018

机译：使用遗传算法计算葡萄牙自动文本摘要的上限

How far we can go with extractive text summarization? Heuristic methods to obtain near upper bounds

摘要

著录项

相似文献

相关主题

期刊订阅