COMPARATIVE STUDY OF K-MEANS AND K-MEANS++ CLUSTERING ALGORITHMS ON CRIME DOMAIN | Science Publications

Bashar Aubaidan; Masnizah Mohd; Mohammed Albared

首页> 外文期刊>Journal of computer sciences >COMPARATIVE STUDY OF K-MEANS AND K-MEANS++ CLUSTERING ALGORITHMS ON CRIME DOMAIN | Science Publications

【24h】

COMPARATIVE STUDY OF K-MEANS AND K-MEANS++ CLUSTERING ALGORITHMS ON CRIME DOMAIN | Science Publications

机译：犯罪域上的K-均值和K-MEANS ++聚类算法的比较研究科学出版物

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

> This study presents the results of an experimental study of two document clustering techniques which are k-means and k-means++. In particular, we compare the two main approaches in crime document clustering. The drawback of k-means is that the user needs to define the centroid point. This becomes more critical when dealing with document clustering because each center point represented by a word and the calculation of distance between words is not a trivial task. To overcome this problem, a k-means++ was introduced in order to find a good initial center point. Since k-means++ has not being applied before in crime document clustering, this study presented a comparative study between k-means and k-means++ to investigate whether the initialization process in k-means++ does help to get a better results than k-means. We proposes the k-means++ clustering algorithm, to identify best seed for initial cluster centers in clustering crime document. The aim of this study is to conduct a comparative study of two main clustering algorithms, namely k-means and k-means++. The method of this study includes a pre-processing phase, which in turn involves tokeniza-tion, stop-words removal and stemming. In addition, we evaluate the impact of the two similarity/distance measures (Cosine similarity and Jaccard coefficient) on the results of the two clustering algorithms. Exper-imental results on several settings of the crime data set showed that by identifying the best seed for initial cluster centers, k-mean++ can significantly (with the significance interval at 95%) work better than k-means. These results demonstrate the accuracy of k-mean++ clustering algorithm in clustering crime doc-uments.

机译： >这项研究提供了对两种文档聚类技术（k-means和k-means ++）的实验研究结果。特别是，我们比较了犯罪文件聚类中的两种主要方法。 k均值的缺点是用户需要定义质心点。当处理文档聚类时，这变得尤为重要，因为由单词表示的每个中心点以及单词之间的距离的计算并不是一件容易的事。为了克服这个问题，为了找到一个好的初始中心点，引入了k-means ++。由于k-means ++之前从未在犯罪文档聚类中应用，因此本研究提出了k-means和k-means ++之间的比较研究，以研究k-means ++中的初始化过程是否确实比k-means更好地获得了结果。我们提出了k-means ++聚类算法，为聚类犯罪文档中的初始聚类中心确定最佳种子。本研究的目的是对两种主要的聚类算法即k-means和k-means ++进行比较研究。这项研究的方法包括一个预处理阶段，该阶段依次涉及标记化，停用词删除和词干提取。此外，我们评估了两种相似度/距离度量（余弦相似度和雅克卡系数）对两种聚类算法结果的影响。在犯罪数据集的多个设置上的实验结果表明，通过为初始聚类中心确定最佳种子，k-mean ++的效果显着（显着性区间为95％）优于k-means。这些结果证明了k-mean ++聚类算法在聚类犯罪文档中的准确性。

著录项

来源
《Journal of computer sciences》 |2014年第7期|共页
作者
Bashar Aubaidan; Masnizah Mohd; Mohammed Albared;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. COMPARATIVE STUDY OF K-MEANS AND K-MEANS++ CLUSTERING ALGORITHMS ON CRIME DOMAIN [J] . Bashar Aubaidan, Masnizah Mohd, Mohammed Albared Journal of computer sciences . 2014,第7期

机译：犯罪域上K-均值和K-MEANS ++聚类算法的比较研究
2. Content Based Medical Image Retrieval with Texture Content Using Gray Level Co-occurrence Matrix and K-Means Clustering Algorithms | Science Publications [J] . B. Ramamurthy, K. R. Chandran Journal of computer sciences . 2012,第7期

机译：基于内容的医学图像检索，纹理内容使用灰度共发生矩阵和k均值聚类算法|科学出版物
3. Computational Complexity between K-Means and K-Medoids Clustering Algorithms for Normal and Uniform Distributions of Data Points | Science Publications [J] . T. Santhanam, T. Velmurugan Journal of computer sciences . 2010,第3期

机译：数据点的正态分布和均匀分布的K均值和K-Medoids聚类算法之间的计算复杂性科学出版物
4. A comparative study of K-Means, K-Means++ and Fuzzy C-Means clustering algorithms [C] . Akanksha Kapoor, Abhishek Singhal International Conference on Computational Intelligence Communication Technology . 2017

机译：K-Means，K-Means ++和Fuzzy C-Means聚类算法的比较研究
5. Hardware Implementation and Performance Evaluation of K-Means and K-Means++ Clustering Algorithms [D] . Singh, Manisha . 2019

机译：K-Means和K-Means ++聚类算法的硬件实现和性能评估
6. Does Determination of Initial Cluster Centroids Improve the Performance of K-Means Clustering Algorithm? Comparison of Three Hybrid Methods by Genetic Algorithm Minimum Spanning Tree and Hierarchical Clustering in an Applied Study [O] . Saeedeh Pourahmad, Atefeh Basirat, Amir Rahimi, 2020

机译：初始簇质心的确定是否提高了K-Means聚类算法的性能？应用研究中遗传算法最小生成树和分层聚类的三种混合方法的比较
7. COMPARATIVE STUDY OF K-MEANS AND K-MEANS++ CLUSTERING ALGORITHMS ON CRIME DOMAIN [O] . Bashar Aubaidan, Masnizah Mohd, Mohammed Albared 2015

机译：犯罪域中K-means和K-means ++聚类算法的比较研究

COMPARATIVE STUDY OF K-MEANS AND K-MEANS++ CLUSTERING ALGORITHMS ON CRIME DOMAIN | Science Publications

摘要

著录项

相似文献

相关主题

期刊订阅