首页> 外文会议>International Conference on Similarity Search and Applications >Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms
【24h】

Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms

机译:更快的k-Medoids聚类:改进PAM,CLARA和CLARANS算法

获取原文

摘要

Clustering non-Euclidean data is difficult, and one of the most used algorithms besides hierarchical clustering is the popular algorithm Partitioning Around Medoids (PAM), also simply referred to as k-medoids. In Euclidean geometry the mean—as used in k-means—is a good estimator for the cluster center, but this does not exist for arbitrary dissimilarities. PAM uses the medoid instead, the object with the smallest dissimilarity to all others in the cluster. This notion of centrality can be used with any (dis-)similarity, and thus is of high relevance to many domains and applications. A key issue with PAM is its high run time cost. We propose modifications to the PAM algorithm that achieve an O(k)-fold speedup in the second ('SWAP') phase of the algorithm, but will still find the same results as the original PAM algorithm. If we slightly relax the choice of swaps performed (while retaining comparable quality), we can further accelerate the algorithm by performing up to k swaps in each iteration. With the substantially faster SWAP, we can now explore faster intialization strategies. We also show how the CLARA and CLARANS algorithms benefit from the proposed modifications.
机译:对非欧几里得数据进行聚类很困难,除分层聚类外,最常用的算法之一是流行的算法(围绕类群进行分区)(PAM),也简称为k-类群。在欧几里得几何学中,均值(用于k均值)是聚类中心的一个很好的估计值,但是对于任意的相异性却不存在。 PAM改用medoid,即与群集中所有其他对象的相似度最小的对象。这种中心性的概念可以与任何(不相似)相似性一起使用,因此与许多领域和应用都高度相关。 PAM的关键问题是运行时间成本高。我们提出了对PAM算法的修改,该修改在算法的第二个('SWAP')阶段实现了O(k)倍加速,但仍会发现与原始PAM算法相同的结果。如果我们稍微放松执行交换的选择(同时保持相当的质量),则可以通过在每次迭代中执行最多k个交换来进一步加速算法。借助明显更快的SWAP,我们现在可以探索更快的初始化策略。我们还展示了CLARA和CLARANS算法如何从建议的修改中受益。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号