An initial seed selection algorithm for k-means clustering of georeferenced data to improve replicability of cluster assignments for mapping application

Fouad Khan

首页> 外文期刊>Applied Soft Computing >An initial seed selection algorithm for k-means clustering of georeferenced data to improve replicability of cluster assignments for mapping application

【24h】

An initial seed selection algorithm for k-means clustering of georeferenced data to improve replicability of cluster assignments for mapping application

机译：用于地理参考数据的k均值聚类的初始种子选择算法，以提高用于地图绘制应用的聚类分配的可复制性

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

K-means is one of the most widely used clustering algorithms in various disciplines, especially for large datasets. However the method is known to be highly sensitive to initial seed selection of cluster centers. K-means++ has been proposed to overcome this problem and has been shown to have better accuracy and computational efficiency than k-means. In many clustering problems though - such as when classifying georeferenced data for mapping applications - standardization of clustering methodology, specifically, the ability to arrive at the same cluster assignment for every run of the method i.e. replicability of the methodology, may be of greater significance than any perceived measure of accuracy, especially when the solution is known to be non-unique, as in the case of k-means clustering. Here we propose a simple initial seed selection algorithm for k-means clustering along one attribute that draws initial cluster boundaries along the "deepest valleys" or greatest gaps in dataset. Thus, it incorporates a measure to maximize distance between consecutive cluster centers which augments the conventional k-means optimization for minimum distance between cluster center and cluster members. Unlike existing initialization methods, no additional parameters or degrees of freedom are introduced to the clustering algorithm. This improves the replicability of cluster assignments by as much as 100% over k-means and k-means++, virtually reducing the variance over different runs to zero, without introducing any additional parameters to the clustering process. Further, the proposed method is more computationally efficient than k-means++ and in some cases, more accurate.

机译：K-means是各种学科中使用最广泛的聚类算法之一，尤其是对于大型数据集。然而，已知该方法对簇中心的初始种子选择高度敏感。已经提出了K-means ++来克服这个问题，并且已经证明K-means ++比k-means具有更好的准确性和计算效率。但是，在许多聚类问题中（例如，在对用于地图应用程序的地理参考数据进行分类时），聚类方法的标准化，尤其是对于方法的每次运行都达到相同的聚类分配的能力，即方法的可复制性，可能比任何可感知的准确性度量，尤其是在已知解决方案不唯一的情况下，例如在k均值聚类的情况下。在这里，我们提出了一种简单的初始种子选择算法，用于沿着一个属性进行k均值聚类，该属性沿“数据集的最深谷”或最大差距绘制了初始聚类边界。因此，它采用了一种措施来最大化连续聚类中心之间的距离，这增加了常规的k均值优化，以实现聚类中心与聚类成员之间的最小距离。与现有的初始化方法不同，聚类算法没有引入其他参数或自由度。与k-means和k-means ++相比，这将群集分配的可复制性提高了100％，实际上将不同运行的方差降低为零，而无需在群集过程中引入任何其他参数。此外，所提出的方法比k-means ++具有更高的计算效率，并且在某些情况下更准确。

著录项

来源
《Applied Soft Computing》 |2012年第11期|共3页
作者
Fouad Khan;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算机软件;
关键词
Classification; Grouping of data; Natural breaks; Jenks natural breaks; Natural breaks ArcGIS; Jenks ArcGIS;

机译：分类;数据分组;自然休息;Jenks自然休息;自然休息ArcGIS;Jenks ArcGIS;

相似文献

外文文献
中文文献
专利

1. An initial seed selection algorithm for k-means clustering of georeferenced data to improve replicability of cluster assignments for mapping application [J] . Fouad Khan Applied Soft Computing . 2012,第11期

机译：用于地理参考数据的k均值聚类的初始种子选择算法，以提高用于地图绘制应用的聚类分配的可复制性
2. IMPROVING CUSTOMER CLUSTERING BY OPTIMAL SELECTION OF CLUSTER CENTROIDS IN K-MEANS AND K-MEDOIDS ALGORITHMS [J] . SHAHLA MOUSAVI, FARSAD ZAMANI BOROUJENI, SAEED ARYANMEHR Journal of Theoretical and Applied Information Technology . 2020,第18期

机译：通过在K-Means和K-METOIDS算法中最佳选择通过最佳选择来改善客户聚类
3. DIC-DOC-K-means: Dissimilarity-based Initial Centroid selection for DOCument clustering using K-means for improving the effectiveness of text document clustering [J] . Lakshmi R., Baskar S. Journal of Information Science . 2019,第6期

机译：DIC-DOC-K-means：使用K-means的DOCument聚类基于不相似性的初始质心选择，以提高文本文档聚类的效率
4. Improved Initial Clustering Center Selection Method for k-Means Algorithm [C] . Qingqing Xie, He Jiang, Bing Han, International Conference on Instrumentation and Measurement, Computer, Communication and Control . 2018

机译：k均值算法的改进初始聚类中心选择方法
5. K-means clustering with automatic determination of K using a Multiobjective Genetic Algorithm with applications to microarray gene expression data. [D] . Shaw, Matthew Karl Ellis. 2015

机译：使用多目标遗传算法自动确定K值的K均值聚类，并应用于微阵列基因表达数据。
6. Does Determination of Initial Cluster Centroids Improve the Performance of K-Means Clustering Algorithm? Comparison of Three Hybrid Methods by Genetic Algorithm Minimum Spanning Tree and Hierarchical Clustering in an Applied Study [O] . Saeedeh Pourahmad, Atefeh Basirat, Amir Rahimi, 2020

机译：初始簇质心的确定是否提高了K-Means聚类算法的性能？应用研究中遗传算法最小生成树和分层聚类的三种混合方法的比较
7. An Initial Seed Selection Algorithm for K-means Clustering of Georeferenced Data to Improve Replicability of Cluster Assignments for Mapping Application [O] . Khan, Fouad 2016

机译：一种用于K-means聚类的初始种子选择算法地理参考数据提高集群分配的可复制性制图应用

An initial seed selection algorithm for k-means clustering of georeferenced data to improve replicability of cluster assignments for mapping application

摘要

著录项

相似文献

相关主题

期刊订阅