K-Means is one of the most important data mining techniques for scientists who want to analyze their data. But K-Means has the disadvantage that it is unable to handle noise data points. This paper proposes a technique that can be applied to the k-means Clustering result to exclude noise data points. We refer to it as KMN (short for K-Means with Noise). This technique is compatible with the different strategies to initialize k-means and determine the number of clusters. Moreover, it is completely parameter-free. The technique has been tested on artificial and real data sets to demonstrate its performance in comparison with other noise-excluding techniques for k-means.
展开▼
机译:对于想分析其数据的科学家来说,K-Means是最重要的数据挖掘技术之一。但是K-Means具有无法处理噪声数据点的缺点。本文提出了一种可应用于k均值聚类结果以排除噪声数据点的技术。我们将其称为KMN(K-Means with Noise)的缩写。该技术与初始化k均值和确定簇数的不同策略兼容。而且,它完全没有参数。该技术已经在人工数据集和真实数据集上进行了测试,以证明其与k均值的其他除噪技术相比的性能。
展开▼