首页> 外文会议> >Efficient Parallel Hierarchical Clustering
【24h】

Efficient Parallel Hierarchical Clustering

机译:高效的并行层次聚类

获取原文
获取原文并翻译 | 示例

摘要

Hierarchical agglomerative clustering (HAC) is a common clustering method that outputs a dendrogram showing all N levels of agglomerations where N is the number of objects in the data set. High time and memory complexities are some of the major bottlenecks in its application to real-world problems. In the literature parallel algorithms are proposed to overcome these limitations. But, as this paper shows, existing parallel HAC algorithms are inefficient due to ineffective partitioning of the data. We first show how HAC follows a rule where most agglomerations have very small dissimilarity and only a small portion towards the end have large dissimilarity. Partially overlapping partitioning (POP) exploits this principle and obtains efficient yet accurate HAC algorithms. The total number of dissimilarities is reduced by a factor close to the number of cells in the partition. We present pPOP, the parallel version of POP, that is implemented on a shared memory multiprocessor architecture. Extensive theoretical analysis and experimental results are presented and show that pPOP gives close to linear speedup and outperforms the existing parallel algorithms significantly both in CPU time and memory requirements.
机译:层次聚集聚类(HAC)是一种常见的聚类方法,可输出显示所有N个聚集级别的树状图,其中N是数据集中的对象数。高时间和内存复杂性是将其应用于实际问题的主要瓶颈。在文献中提出了并行算法来克服这些限制。但是,正如本文所示,由于数据分区无效,现有的并行HAC算法效率低下。我们首先显示HAC如何遵循这样的规则,即大多数集聚区的相差很小,而末端的一小部分则相差很大。部分重叠分区(POP)利用了这一原理,并获得了有效而准确的HAC算法。相异的总数减少了接近分区中像元数的倍数。我们介绍了pPOP,它是POP的并行版本,在共享内存多处理器体系结构上实现。大量的理论分析和实验结果表明,pPOP具有接近线性的加速能力,并且在CPU时间和内存需求上均明显优于现有的并行算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号