Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms

机译：更快的k-Medoids聚类：改进PAM，CLARA和CLARANS算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Clustering non-Euclidean data is difficult, and one of the most used algorithms besides hierarchical clustering is the popular algorithm Partitioning Around Medoids (PAM), also simply referred to as k-medoids. In Euclidean geometry the mean—as used in k-means—is a good estimator for the cluster center, but this does not exist for arbitrary dissimilarities. PAM uses the medoid instead, the object with the smallest dissimilarity to all others in the cluster. This notion of centrality can be used with any (dis-)similarity, and thus is of high relevance to many domains and applications. A key issue with PAM is its high run time cost. We propose modifications to the PAM algorithm that achieve an O(k)-fold speedup in the second ('SWAP') phase of the algorithm, but will still find the same results as the original PAM algorithm. If we slightly relax the choice of swaps performed (while retaining comparable quality), we can further accelerate the algorithm by performing up to k swaps in each iteration. With the substantially faster SWAP, we can now explore faster intialization strategies. We also show how the CLARA and CLARANS algorithms benefit from the proposed modifications.

机译：对非欧几里得数据进行聚类很困难，除分层聚类外，最常用的算法之一是流行的算法（围绕类群进行分区）（PAM），也简称为k-类群。在欧几里得几何学中，均值（用于k均值）是聚类中心的一个很好的估计值，但是对于任意的相异性却不存在。 PAM改用medoid，即与群集中所有其他对象的相似度最小的对象。这种中心性的概念可以与任何（不相似）相似性一起使用，因此与许多领域和应用都高度相关。 PAM的关键问题是运行时间成本高。我们提出了对PAM算法的修改，该修改在算法的第二个（'SWAP'）阶段实现了O（k）倍加速，但仍会发现与原始PAM算法相同的结果。如果我们稍微放松执行交换的选择（同时保持相当的质量），则可以通过在每次迭代中执行最多k个交换来进一步加速算法。借助明显更快的SWAP，我们现在可以探索更快的初始化策略。我们还展示了CLARA和CLARANS算法如何从建议的修改中受益。

著录项

来源
《International Conference on Similarity Search and Applications》|2019年|171-187|共17页
会议地点 Newark(US)
作者
Erich Schubert; Peter J. Rousseeuw;
展开▼
作者单位

Teehnische Universitaet Dortmund Dortmund Germany;

Department of Mathematics KU Leuven Leuven Belgium;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Cluster analysis; k-Medoids; PAM; CLARA; CLARANS;

机译：聚类分析; k-Medoids；粮食计划署；克拉拉;克拉兰斯;

相似文献

外文文献
中文文献
专利

1. Fast and eager k-medoids clustering: O(k) runtime improvement of the PAM, CLARA, and CLARANS algorithms [J] . Schubert Erich, Rousseeuw Peter J. Information Systems . 2021,第Nova期

机译：快速和急于k-medoids聚类：o（k）帕姆，克拉拉和clarans算法的运行时间改进
2. IMPROVING CUSTOMER CLUSTERING BY OPTIMAL SELECTION OF CLUSTER CENTROIDS IN K-MEANS AND K-MEDOIDS ALGORITHMS [J] . SHAHLA MOUSAVI, FARSAD ZAMANI BOROUJENI, SAEED ARYANMEHR Journal of Theoretical and Applied Information Technology . 2020,第18期

机译：通过在K-Means和K-METOIDS算法中最佳选择通过最佳选择来改善客户聚类
3. Ranked k-medoids: A fast and accurate rank-based partitioning algorithm for clustering large datasets [J] . Seyed Mohammad Razavi Zadegan, Mehdi Mirzaie, Farahnaz Sadoughi Knowledge-Based Systems . 2013,第feba期

机译：排序的k-medoids：一种用于对大型数据集进行聚类的快速，精确的基于排序的划分算法
4. Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms [C] . Erich Schubert, Peter J. Rousseeuw International Conference on Similarity Search and Applications . 2019

机译：更快的K-medoids聚类：改善PAM，Clara和Clarans算法
5. Clustering Students' Metacognitive Beliefs: Comparing the Results of K-Means and K-Medoids Algorithms [D] . Bukoski, Elizabeth 2018

机译：聚类学生的元认知信念：比较K-Means和K-Medoids算法的结果
6. Clustering and Characterization of the Lactation Curves of Dairy Cows Using K-Medoids Clustering Algorithm [O] . Mingyung Lee, Seonghun Lee, Jaehwa Park, 2020

机译：k-meyoids聚类算法使用乳制奶牛哺乳曲线的聚类与表征
7. Fast and eager k-medoids clustering: O(k) runtime improvement of the PAM, CLARA, and CLARANS algorithms [O] . Erich Schubert, Peter J. Rousseeuw 2021

机译：快速和急于k-medoids聚类：o（k）PAM，Clara和Clarans算法的运行时间改进

Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms

摘要

著录项

相似文献

相关主题

期刊订阅