Clustering method based on data division and partition

卢志茂; 刘晨; 张春祥; 王蕾

摘要

Many classical clustering algorithms do good jobs on their prerequisite but do not scale well when being applied to deal with very large data sets (VLDS). In this work, a novel division and partition clustering method (DP) was proposed to solve the problem. DP cut the source data set into data blocks, and extracted the eigenvector for each data block to form the local feature set. The local feature set was used in the second round of the characteristics polymerization process for the source data to find the global eigenvector. Ultimately according to the global eigenvector, the data set was assigned by criterion of minimum distance. The experimental results show that it is more robust than the conventional clusterings. Characteristics of not sensitive to data dimensions, distribution and number of nature clustering make it have a wide range of applications in clustering VLDS.

机译：许多古典聚类算法在他们的先决条件下做好工作，但在应用于处理非常大的数据集（VLD）时不刻度。在这项工作中，提出了一种新的划分和分区聚类方法（DP）来解决问题。 DP将源数据剪切到数据块中，并提取每个数据块的特征向量以形成本地特征集。本地特征集在第二轮特征聚合过程中用于源数据以找到全局特征向量。最终根据全局特征向量，通过最小距离的标准分配数据集。实验结果表明，它比传统集群更鲁棒。对数据维度不敏感的特征，分布和自然群集的分布使得它在聚类VLD中具有广泛的应用。

Clustering method based on data division and partition

摘要

著录项

相关主题

期刊订阅