首页> 外文OA文献 >Multidocument Arabic Text Summarization Based on Clustering and Word2Vec to Reduce Redundancy

【2h】

Multidocument Arabic Text Summarization Based on Clustering and Word2Vec to Reduce Redundancy

机译：基于聚类和Word2VEC来减少冗余的MultiDocument阿拉伯文摘要

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Arabic is one of the most semantically and syntactically complex languages in the world. A key challenging issue in text mining is text summarization, so we propose an unsupervised score-based method which combines the vector space model, continuous bag of words (CBOW), clustering, and a statistically-based method. The problems with multidocument text summarization are the noisy data, redundancy, diminished readability, and sentence incoherency. In this study, we adopt a preprocessing strategy to solve the noise problem and use the word2vec model for two purposes, first, to map the words to fixed-length vectors and, second, to obtain the semantic relationship between each vector based on the dimensions. Similarly, we use a k-means algorithm for two purposes: (1) Selecting the distinctive documents and tokenizing these documents to sentences, and (2) using another iteration of the k-means algorithm to select the key sentences based on the similarity metric to overcome the redundancy problem and generate the initial summary. Lastly, we use weighted principal component analysis (W-PCA) to map the sentences’ encoded weights based on a list of features. This selects the highest set of weights, which relates to important sentences for solving incoherency and readability problems. We adopted Recall-Oriented Understudy for Gisting Evaluation (ROUGE) as an evaluation measure to examine our proposed technique and compare it with state-of-the-art methods. Finally, an experiment on the Essex Arabic Summaries Corpus (EASC) using the ROUGE-1 and ROUGE-2 metrics showed promising results in comparison with existing methods.

机译：阿拉伯语是世界上最义的语义和奇妙的复杂语言之一。文本挖掘中的一个关键具有挑战性的问题是文本摘要，因此我们提出了一种无监督的得分的方法，该方法结合了矢量空间模型，连续的单词（CBow），聚类和基于统计的方法。 Multivocument文本摘要的问题是嘈杂的数据，冗余，可读性和句子间距。在这项研究中，我们采用预处理策略来解决噪声问题并使用Word2VEC模型进行两种用途，首先将单词映射到固定长度向量，而第二，以基于尺寸获得每个向量之间的语义关系。同样，我们使用K-means算法有两个目的：（1）使用k-means算法的另一个迭代来选择独特的文档并将这些文档授权到句子，（2）基于相似度量选择关键句子克服冗余问题并生成初始摘要。最后，我们使用加权主成分分析（W-PCA）根据特征列表来映射句子编码的权重。这选择了最高的权重，这与解决不连锁性和可读性问题的重要句子有关。我们采用了召回考虑的思考，用于调用评估（Rouge）作为检查我们所提出的技术的评估措施，并将其与最先进的方法进行比较。最后，使用Rouge-1和Rouge-2度量的Essex阿拉伯语摘要语料库（EASC）的实验表明，与现有方法相比，有希望的结果。

著录项

作者
Samer Abdulateef; Naseer Ahmed Khan; Bolin Chen; Xuequn Shang;
展开▼
作者单位

展开▼
年度 2020
总页数
原文格式 PDF
正文语种 eng
中图分类

相似文献

外文文献
中文文献
专利

1. Arabic Text Summarization Based on Latent Semantic Analysis to Enhance Arabic Documents Clustering [J] . Hanane Froud, Abdelmonaime Lachkar, Said Alaoui Ouatik International Journal of Data Mining & Knowledge Management Process . 2013,第1期

机译：基于潜在语义分析的阿拉伯文本摘要增强阿拉伯文档聚类
2. ArA*summarizer: An Arabic text summarization system based on subtopic segmentation and using an A* algorithm for reduction [J] . Expert Systems . 2020,第2期

机译：ArA * summarizer：基于子主题分段并使用A *算法进行归约的阿拉伯文本摘要系统
3. A Hybrid Arabic Text Summarization Technique Based on Text Structure and Topic Identification [J] . Bassam H. Hammo, Hani Abu-Salem, Martha W. Evens International journal of computer processing of languages . 2011,第1期

机译：基于文本结构和主题识别的混合阿拉伯文本摘要技术
4. Improving graph based multidocument text summarization using an enhanced sentence similarity measure [C] . Sarkar Kamal, Saraf Khushbu, Ghosh Avishikta 2015 IEEE 2nd International Conference on Recent Trends in Information Systems . 2015

机译：使用增强的句子相似度度量改进基于图的多文档文本摘要
5. Information fusion for multidocument summarization: Paraphrasing and generation. [D] . Barzilay, Regina. 2003

机译：多文档摘要的信息融合：释义和生成。
6. An Automatic Multidocument Text Summarization Approach Based on Naïve Bayesian Classifier Using Timestamp Strategy [O] . Nedunchelian Ramanujam, Manivannan Kaliappan 2016

机译：基于朴素贝叶斯分类器的时间戳策略自动多文档文本摘要方法
7. Arabic Text Summarization Based on Latent Semantic Analysis to Enhance Arabic Documents Clustering [O] . Hanane Froud, Abdelmonaime Lachkar, Said Alaoui Ouatik 2013

机译：基于潜在语义分析的阿拉伯文文本摘要，增强阿拉伯文档聚类
8. Multidocument Summarization via Information Extraction [R] . White, M. , Korelsky, T. , Cardie, C. , 2001

机译：通过信息提取的多文档摘要

Multidocument Arabic Text Summarization Based on Clustering and Word2Vec to Reduce Redundancy

摘要

著录项

相似文献

相关主题

期刊订阅