Adapt Clustering Methods for Arabic Documents

Boumedyen Shannaq

首页> 外文期刊>American Journal of Information Systems >Adapt Clustering Methods for Arabic Documents

【24h】

Adapt Clustering Methods for Arabic Documents

机译：适应阿拉伯文件的聚类方法

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This research paper develops new clustering method (FWC) and further proposes a new approach to filtering data collected from internet resources. The focus of this research paper is clustering groups’ data instances into subsets in such a manner that similar instances are grouped together, while different instances belong to different groups. The instances are thereby organized into an efficient representation that characterizes the population being sampled thereby reducing the gigantic size of retrieved data. This has been done by removing dissimilar text files, and grouping similar documents into homogeneous clusters. Arabic text files of 974 MB has been collected, processed,?analyzed and filtered by using common clustering methods. This new clustering methods are presented, divided into: hierarchical, partitioning, density-based, model-based and soft-computing methods. Following the methods, the challenges of performing clustering in large data sets are discussed and tested by the proposed new clustering method. Two experiments were conducted to establish the effectiveness of FWC methods and the obtained results show that the new FCW method suggested in this paper produced better results and outperformed existing clustering methods.

机译：本研究论文开发了一种新的聚类方法（FWC），并进一步提出了一种新的方法来过滤从Internet资源收集的数据。本研究论文的重点是将组的数据实例聚类为子集，以使相似的实例分组在一起，而不同的实例属于不同的组。因此，实例被组织成一个有效的表示形式，该表示形式表征了要采样的总体，从而减小了所检索数据的巨大规模。这是通过删除不相似的文本文件，并将相似的文档分组为同类的簇来完成的。 974 MB的阿拉伯文本文件已通过使用常见的聚类方法进行收集，处理，分析和过滤。提出了这种新的聚类方法，分为：分层，分区，基于密度，基于模型和软计算的方法。遵循这些方法，通过提出的新聚类方法讨论并测试了在大数据集中执行聚类的挑战。进行了两个实验来确定FWC方法的有效性，所得结果表明，本文提出的新FCW方法产生了更好的结果，并且优于现有的聚类方法。

著录项

来源
《American Journal of Information Systems》 |2013年第1期|共5页
作者
Boumedyen Shannaq;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. A New Keyphrases Extraction Method Based on Suffix Tree Data Structure for Arabic Documents Clustering [J] . Issam SAHMOUDI, Hanane FROUD, Abdelmonaime LACHKAR International Journal of Database Management Systems . 2013,第6期

机译：基于后缀树数据结构的阿拉伯语文档聚类新关键词提取方法
2. Arabic Text Summarization Based on Latent Semantic Analysis to Enhance Arabic Documents Clustering [J] . Hanane Froud, Abdelmonaime Lachkar, Said Alaoui Ouatik International Journal of Data Mining & Knowledge Management Process . 2013,第1期

机译：基于潜在语义分析的阿拉伯文本摘要增强阿拉伯文档聚类
3. An adaptive text-line extraction algorithm for printed Arabic documents with diacritics [J] . Khader Mohammad, Aziz Qaroush, Mahdi Washha, Multimedia Tools and Applications . 2021,第2期

机译：一种自适应文本线提取算法，具有变音的印刷阿拉伯文档
4. Improving Arabic document clustering using K-means algorithm and Particle Swarm Optimization [C] . Abdullah S. Daoud, Ahmed Sallam, Mohamed E. Wheed 2017 Intelligent Systems Conference . 2017

机译：使用K-means算法和粒子群算法改进阿拉伯文文档聚类
5. Text document topical recursive clustering and automatic labeling of a hierarchy of document clusters. [D] . Li, Xiaoxiao. 2012

机译：文本文档主题递归群集和文档群集层次结构的自动标记。
6. Adaptive clustering and adaptive weighting methods to detect disease associated rare variants [O] . Qiuying Sha, Shuaicheng Wang, Shuanglin Zhang 2013

机译：自适应聚类和自适应加权方法来检测与疾病相关的罕见变异
7. A new keyphrases extraction method based on suffix tree data structure for arabic documents clustering [O] . Sahmoudi, Issam, Froud, Hanane, Lachkar, Abdelmonaime 2014

机译：一种新的基于后缀树数据结构的关键短语提取方法用于阿拉伯文件聚类

Adapt Clustering Methods for Arabic Documents

摘要

著录项

相似文献

相关主题

期刊订阅