A semantic approach to extractive multi-document summarization: Applying sentence expansion for tuning of conceptual densities

Mohammad Bidoki; Mohammad R. Moosavi; Mostafa Fakhrahmad

首页> 外文期刊>Information Processing & Management >A semantic approach to extractive multi-document summarization: Applying sentence expansion for tuning of conceptual densities

【24h】

A semantic approach to extractive multi-document summarization: Applying sentence expansion for tuning of conceptual densities

机译：提取多文件摘要的语义方法：应用句子扩张调整概念密度

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Today, due to a vast amount of textual data, automated extractive text summarization is one of the most common and practical techniques for organizing information. Extractive summarization selects the most appropriate sentences from the text and provide a representative summary. The sentences, as individual textual units, usually are too short for major text processing techniques to provide appropriate performance. Hence, it seems vital to bridge the gap between short text units and conventional text processing methods. In this study, we propose a semantic method for implementing an extractive multi-document summarizer system by using a combination of statistical, machine learning based, and graph-based methods. It is a language-independent and unsupervised system. The proposed framework learns the semantic representation of words from a set of given documents via word2vec method. It expands each sentence through an innovative method with the most informative and the least redundant words related to the main topic of sentence. Sentence expansion implicitly performs word sense disambiguation and tunes the conceptual densities towards the central topic of each sentence. Then, it estimates the importance of sentences by using the graph representation of the documents. To identify the most important topics of the documents, we propose an inventive clustering approach. It autonomously determines the number of clusters and their initial cen-troids, and clusters sentences accordingly. The system selects the best sentences from appropriate clusters for the final summary with respect to information salience, minimum redundancy, and adequate coverage. A set of extensive experiments on DUC2002 and DUC2006 datasets was conducted for investigating the proposed scheme. Experimental results showed that the proposed sentence expansion algorithm and clustering approach could considerably enhance the performance of the summarization system. Also, comparative experiments demonstrated that the proposed framework outperforms most of the state-of-the-art summarizer systems and can impressively assist the task of extractive text summarization.

机译：今天，由于大量文本数据，自动化的提取文本摘要是组织信息的最常见和实用的技术之一。提取摘要选择文本中最合适的句子并提供代表性摘要。作为个别文本单位的句子通常太短，对于主要文本处理技术来提供适当的性能。因此，弥合短文本单位和传统文本处理方法之间的差距似乎至关重要。在这项研究中，我们提出了一种通过使用基于统计，机器学习和基于图的方法的组合来实现提取多文件摘要器系统的语义方法。它是一种独立于语言和无人监督的系统。所提出的框架通过Word2Vec方法从一组给定文档中了解单词的语义表示。它通过创新方法扩展每个句子，具有与句子主题相关的最具信息丰富的方法和最少的单词。句子扩展隐含地执行词感歧义，并将概念密度调整为每个句子的中心主题。然后，它估计句子的重要性通过使用文档的图表表示。要确定文件最重要的主题，我们提出了一种创造性聚类方法。它相应地自主地确定群集数及其初始CEN-TROID和群集句子。该系统根据信息Parience，最小冗余和充分覆盖选择最终摘要的最佳句子。对DUC2002和DUC2006数据集进行了一系列广泛的实验，以调查提出的计划。实验结果表明，所提出的句子扩展算法和聚类方法可以大大提高摘要系统的性能。此外，比较实验表明，拟议的框架优于大多数最先进的摘要系统，并且可以令人印象深刻地协助提取文本摘要的任务。

著录项

来源
《Information Processing & Management》 |2020年第6期|102341.1-102341.25|共25页
作者
Mohammad Bidoki; Mohammad R. Moosavi; Mostafa Fakhrahmad;
展开▼
作者单位

Department of Computer Science & Engineering & IT School of Electrical and Computer Engineering Shiraz University Shiraz Iran;

Department of Computer Science & Engineering & IT School of Electrical and Computer Engineering Shiraz University Shiraz Iran;

Department of Computer Science & Engineering & IT School of Electrical and Computer Engineering Shiraz University Shiraz Iran;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Multi-document Extractive Summarization; Sentence Expansion; Conceptual Density Tuning; Word Embedding; Text Clustering; Language-independent Approach;

机译：多文件提取综合规定;句子扩张;概念密度调整;嵌入词;文本聚类;语言独立方法;

相似文献

外文文献
中文文献
专利

1. An unsupervised method for extractive multi-document summarization based on centroid approach and sentence embeddings [J] . Lamsiyah Salima, El Mahdaouy Abdelkader, Espinasse Bernard, Expert systems with applications . 2021,第Apra期

机译：基于质心方法和句子嵌入的提取多文件摘要的无监督方法
2. MSCSO: Extractive Multi-document Summarization Based on a New Criterion of Sentences Overlapping [J] . Khaleghi Zeynab, Fakhredanesh Mohammad, Hourali Maryam Iranian Journal of Science and Technology, Transactions of Electrical Engineering . 2021,第1期

机译：MSCSO：基于句子重叠的新标准的提取多文件摘要
3. Extractive multi-document summarization based on textual entailment and sentence compression via knapsack problem [J] . Naserasadi Ali, Khosravi Hamid, Sadeghi Faramarz Natural language engineering . 2019,第PTa1期

机译：基于背包问题的文本蕴涵和句子压缩的提取式多文档摘要
4. Exploiting Conceptual Relations of Sentences for Multi-document Summarization [C] . Hai-Tao Zheng, Shu-Qin Gong, Ji-Min Guo, International conference on web-age information management . 2015

机译：利用句子的概念关系进行多文档摘要
5. Multi-document Summarization Based on Document Clustering and Neural Sentence Fusion [D] . Fuad, Tanvir Ahmed. 2018

机译：基于文档聚类和神经句子融合的多文件摘要
6. A coherent graph-based semantic clustering and summarization approach for biomedical literature and a new summarization evaluation method [O] . Illhoi Yoo, Xiaohua Hu, Il-Yeol Song 2007

机译：基于相干图的生物医学文献语义聚类和总结方法及新的评价方法
7. Learning to Create Sentence Semantic Relation Graphs for Multi-Document Summarization [O] . Diego Antognini, Boi Faltings 2019

机译：学习创建多文件摘要的句子语义关系图

A semantic approach to extractive multi-document summarization: Applying sentence expansion for tuning of conceptual densities

摘要

著录项

相似文献

相关主题

期刊订阅