首页> 外文期刊>Knowledge-Based Systems >Ranked k-medoids: A fast and accurate rank-based partitioning algorithm for clustering large datasets
【24h】

Ranked k-medoids: A fast and accurate rank-based partitioning algorithm for clustering large datasets

机译:排序的k-medoids:一种用于对大型数据集进行聚类的快速,精确的基于排序的划分算法

获取原文
获取原文并翻译 | 示例

摘要

Clustering analysis is the process of dividing a set of objects into none-overlapping subsets. Each subset is a cluster, such that objects in the cluster are similar to one another and dissimilar to the objects in the other clusters. Most of the algorithms in partitioning approach of clustering suffer from trapping in local optimum and the sensitivity to initialization and outliers. In this paper, we introduce a novel partitioning algorithm that its initialization does not lead the algorithm to local optimum and can find all the Gaussian-shaped clusters if it has the right number of them. In this algorithm, the similarity between pairs of objects are computed once and updating the medoids in each iteration costs O(k × m) where k is the number of clusters and m is the number of objects needed to update medoids of the clusters. Comparison between our algorithm and two other partitioning algorithms is performed by using four well-known external validation measures over seven standard datasets. The results for the larger datasets show the superiority of the proposed algorithm over two other algorithms in terms of speed and accuracy.
机译:聚类分析是将一组对象分成不重叠的子集的过程。每个子集都是一个集群,因此该集群中的对象彼此相似,而与其他集群中的对象互不相同。聚类划分方法中的大多数算法都存在局部最优陷入,对初始化和离群值敏感的问题。在本文中,我们介绍了一种新颖的分区算法,该算法的初始化不会导致该算法达到局部最优,并且如果具有正确数量的簇,则可以找到所有的高斯型簇。在该算法中,一次计算对象对之间的相似度,并在每次迭代中更新medoid的成本为O(k×m),其中k是簇的数量,m是更新簇的medoid所需的对象的数量。我们的算法与其他两种分区算法之间的比较是通过对七个标准数据集使用四种众所周知的外部验证措施来进行的。较大数据集的结果表明,在速度和准确性方面,该算法优于其他两种算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号