Efficient Parallel Hierarchical Clustering

机译：高效的并行层次聚类

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Hierarchical agglomerative clustering (HAC) is a common clustering method that outputs a dendrogram showing all N levels of agglomerations where N is the number of objects in the data set. High time and memory complexities are some of the major bottlenecks in its application to real-world problems. In the literature parallel algorithms are proposed to overcome these limitations. But, as this paper shows, existing parallel HAC algorithms are inefficient due to ineffective partitioning of the data. We first show how HAC follows a rule where most agglomerations have very small dissimilarity and only a small portion towards the end have large dissimilarity. Partially overlapping partitioning (POP) exploits this principle and obtains efficient yet accurate HAC algorithms. The total number of dissimilarities is reduced by a factor close to the number of cells in the partition. We present pPOP, the parallel version of POP, that is implemented on a shared memory multiprocessor architecture. Extensive theoretical analysis and experimental results are presented and show that pPOP gives close to linear speedup and outperforms the existing parallel algorithms significantly both in CPU time and memory requirements.

机译：层次聚集聚类（HAC）是一种常见的聚类方法，可输出显示所有N个聚集级别的树状图，其中N是数据集中的对象数。高时间和内存复杂性是将其应用于实际问题的主要瓶颈。在文献中提出了并行算法来克服这些限制。但是，正如本文所示，由于数据分区无效，现有的并行HAC算法效率低下。我们首先显示HAC如何遵循这样的规则，即大多数集聚区的相差很小，而末端的一小部分则相差很大。部分重叠分区（POP）利用了这一原理，并获得了有效而准确的HAC算法。相异的总数减少了接近分区中像元数的倍数。我们介绍了pPOP，它是POP的并行版本，在共享内存多处理器体系结构上实现。大量的理论分析和实验结果表明，pPOP具有接近线性的加速能力，并且在CPU时间和内存需求上均明显优于现有的并行算法。

著录项

来源
《》|2004年|P.363-371|共9页
会议地点 Pisa(IT);Pisa(IT)
作者
Manoranjan Dash; Simona Petrutiu; Peter Scheuermann;
展开▼
作者单位

Department of Information Systems, School of Computer Engineering, Nanyang Technological University, Singapore 639798;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类理论、方法;
关键词
hierarchical agglomerative clustering; partitioning; parallel algorithm; shared memory architecture;

机译：分层凝聚聚类;分区;并行算法;共享内存架构;

相似文献

外文文献
中文文献
专利

1. Parallel hierarchical architectures for efficient consensus clustering on big multimedia cluster ensembles [J] . Information Sciences: An International Journal . 2020,第期

机译：并行分层体系结构，用于大多数多媒体集群集群的高效共识群集
2. Efficient parallel hierarchical clustering algorithms [J] . Rajasekaran S. IEEE Transactions on Parallel and Distributed Systems . 2005,第6期

机译：高效的并行层次聚类算法
3. Parallelization of a graph-cut based algorithm for hierarchical clustering of web documents [J] . Karthick Seshadri, S. Mercy Shalinie Concurrency and computation: practice and experience . 2015,第17期

机译：Web文档分层聚类的基于图割的算法的并行化
4. Efficient Parallel Hierarchical Clustering [C] . Manoranjan Dash, Simona Petrutiu, Peter Scheuermann International Euro-Par conference . 2004

机译：高效并行分层群集
5. Efficient parallel formulations of hierarchical methods and their applications. [D] . Grama, Ananth Y. 1996

机译：分层方法及其应用的有效并行表述。
6. Intermetallic Cu5Zr Clusters Anchored on Hierarchical Nanoporous Copper as Efficient Catalysts for Hydrogen Evolution Reaction [O] . Hang Shi, Yi-Tong Zhou, Rui-Qi Yao, 2020

机译：固定在多级纳米多孔铜上的金属间Cu5Zr团簇作为高效的产氢反应催化剂
7. Efficient Parallel Hierarchical Clustering [O] . Manoranjan Dash, Peter Scheuermann 2004

机译：高效的并行层次聚类

Efficient Parallel Hierarchical Clustering

摘要

著录项

相似文献

相关主题

期刊订阅