SummCoder: An unsupervised framework for extractive text summarization based on deep auto-encoders

Joshi Akanksha; Fidalgo E.; Alegre E.; Fernandez-Robles Laura

首页> 外文期刊>Expert systems with applications >SummCoder: An unsupervised framework for extractive text summarization based on deep auto-encoders

【24h】

SummCoder: An unsupervised framework for extractive text summarization based on deep auto-encoders

机译：SimmoDer：基于深度自动编码器的提取文本摘要的无监督框架

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In this paper, we propose SummCoder, a novel methodology for generic extractive text summarization of single documents. The approach generates a summary according to three sentence selection metrics formulated by us: sentence content relevance, sentence novelty, and sentence position relevance. The sentence content relevance is measured using a deep auto-encoder network, and the novelty metric is derived by exploiting the similarity among sentences represented as embeddings in a distributed semantic space. The sentence position relevance metric is a hand-designed feature, which assigns more weight to the first few sentences through a dynamic weight calculation function regulated by the document length. Furthermore, a sentence ranking and a selection technique are developed to generate the document summary by ranking the sentences according to the final score obtained through the fusion of the three sentences selection metrics. We also introduce a new summarization benchmark, Tor Illegal Documents Summarization (TIDSumm) dataset, especially to assist Law Enforcement Agencies (LEAs), that contains two sets of ground truth summaries, manually created, for 100 web documents extracted from onion websites in Tor (The Onion Router) network. Empirical results show that, on DUC 2002, on Blog Summarization, and on TIDSumm datasets, our text summarization approach obtains comparable or better performance than the state-of-the-art methods for different ROUGE metrics. (C) 2019 Elsevier Ltd. All rights reserved.

机译：在本文中，我们提出了汇总器，这是一篇新的单一文件的通用提取文本的方法。该方法根据由美国制定的三句选择指标生成摘要：句子内容相关性，句子新颖性和句子位置相关性。使用深度自动编码器网络测量句子内容相关性，通过利用表示作为分布式语义空间中的嵌入式的句子之间的相似性来导出新颖的度量。句子位置相关度量是一种手工设计的功能，通过文档长度调节的动态权重计算功能为前几句分配更多权重。此外，开发了句子排名和选择技术来通过根据通过三个句子选择度量的融合获得的最终分数来排序句子来生成文档摘要。我们还介绍了一个新的摘要基准，Tor非法文件摘要（TIDUMM）数据集，尤其是协助执法机构（LES），其中包含从Tor中提取的100个Web文档的手动创建了两组地面真理摘要（洋葱路由器）网络。经验结果表明，在DUC 2002上，在博客摘要和TIDSUMM数据集上，我们的文本摘要方法比不同的胭脂指标的最先进方法获得了可比或更好的性能。（c）2019 Elsevier Ltd.保留所有权利。

著录项

来源
《Expert systems with applications》 |2019年第9期|200-215|共16页
作者
Joshi Akanksha; Fidalgo E.; Alegre E.; Fernandez-Robles Laura;
展开▼
作者单位

Univ Leon Leon Spain|INCIBE Spanish Natl Inst Cybersecur Leon Spain|CDAC Pune Maharashtra India;

Univ Leon Leon Spain|INCIBE Spanish Natl Inst Cybersecur Leon Spain;

Univ Leon Leon Spain|INCIBE Spanish Natl Inst Cybersecur Leon Spain;

Univ Leon Leon Spain|INCIBE Spanish Natl Inst Cybersecur Leon Spain;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Extractive text summarization; Auto-encoder; Deep learning; Sentence embedding; TOR darknet; Extractive summarization;

机译：提取文本摘要;自动编码器;深入学习;句子嵌入;Tor Darknet;提取综合;

相似文献

外文文献
中文文献
专利

1. SummCoder: An unsupervised framework for extractive text summarization based on deep auto-encoders [J] . Joshi Akanksha, Fidalgo E., Alegre E., Expert Systems with Application . 2019,第SEPa期

机译：SummCoder：基于深度自动编码器的用于抽取文本摘要的无监督框架
2. A Framework for Extractive Text Summarization Based on Deep Learning Modified Neural Network Classifier [J] . Muthu Balaanand, Sivaparthipan C. B., Kumar Priyan Malarvizhi, ACM transactions on Asian and low-resource language information processing . 2021,第3期

机译：基于深度学习修改神经网络分类器的提取文本摘要框架
3. Evaluation of Unsupervised Learning based Extractive Text Summarization Technique for Large Scale Review and Feedback Data [J] . Jai Prakash Verma, Atul Patel Indian Journal of Science and Technology . 2017,第17期

机译：基于大规模学习和反馈数据的基于无监督学习的提取文本摘要技术的评估
4. Extractive Text Summarization Using Deep Auto-encoders [C] . K. Arjun, M. Hariharan, Pooja Anand, International Conference on Advanced Computing, Networking, and Informatics . 2018

机译：使用深自动编码器进行提取文本摘要
5. QUERY-FOCUSED EXTRACTIVE SUMMARIZATION BASED ON DEEP LEARNING: COMPARISON OF SIMILARITY MEASURES FOR PSEUDO GROUND TRUTH GENERATION [D] . Yuliska 2019

机译：基于深度学习的查询重点摘要：伪地面真相生成相似度量的比较
6. Unsupervised Extraction of Diagnosis Codes from EMRs Using Knowledge-Based and Extractive Text Summarization Techniques [O] . Ramakanth Kavuluru, Sifei Han, Daniel Harris -1

机译：使用基于知识的和提取文本摘要技术从EMR中无监督地提取诊断代码
7. A Novel Framework Using Deep Auto-Encoders Based Linear Model for Data Classification [O] . Ahmad M. Karim, Hilal Kaya, Mehmet Serdar Güzel, 2020

机译：一种新颖的框架，基于深度自动编码基于基于数据分类的线性模型

SummCoder: An unsupervised framework for extractive text summarization based on deep auto-encoders

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅