首页> 外文会议>IEEE International Conference on Big Data Computing Service and Applications >A Distributed Multi-exemplar Affinity Propagation Clustering Algorithm Based on MapReduce
【24h】

A Distributed Multi-exemplar Affinity Propagation Clustering Algorithm Based on MapReduce

机译:基于MapReduce的分布式多示例亲和力传播聚类算法

获取原文

摘要

Clustering algorithm is one of the fundamental techniques in data mining, which plays a crucial role in various applications, such as pattern recognition, document retrieval, and computer vision. As so far, many effective algorithms have been proposed. Affinity Propagation is an algorithm requires no parameter indicating the number of clusters, which is the most distinguishing advantage compared to the k-means clustering algorithm. Multi-Exemplar Affinity Propagation (MEAP) extends the single-exemplar model to the multi-exemplar model, which could describe the dataset with more complex structure. With the amount of data increasing rapidly, the growing size of dataset makes the clustering problem become more and more challenging. To solve this problem, the parallel computing framework is widely used, such as MapReduce. However, for the MEAP algorithm, it is not a straightforward task to implement the updating of MEAP messages in MapReduce, which without proper design would be time-consuming. In this paper, we propose to utilize the stability of data distribution to apply the MEAP algorithm on the MapReduce platform and develop an efficient Distributed Multi-Exemplar Affinity Propagation (DisMEAP) clustering algorithm by using three MapReduce stages. The experiment results demonstrate that our algorithm can perform well in processing large-scale datasets and could achieve the same accuracy as the original MEAP algorithm.
机译:聚类算法是数据挖掘中的基本技术之一,它在模式识别,文档检索和计算机视觉等各种应用中起着至关重要的作用。到目前为止,已经提出了许多有效的算法。亲和传播是一种无需参数即可指示聚类数量的算法,与k均值聚类算法相比,这是最明显的优势。多示例亲和力传播(MEAP)将单示例模型扩展到多示例模型,该模型可以描述结构更复杂的数据集。随着数据量的快速增长,数据集的不断增长使聚类问题变得越来越具有挑战性。为了解决这个问题,并行计算框架被广泛使用,例如MapReduce。但是,对于MEAP算法而言,在MapReduce中实现MEAP消息的更新并不是一项简单的任务,如果没有适当的设计,这将是很耗时的。在本文中,我们建议利用数据分发的稳定性将MEAP算法应用于MapReduce平台,并通过三个MapReduce阶段开发一种有效的分布式多样本亲和力传播(DisMEAP)聚类算法。实验结果表明,我们的算法在处理大规模数据集方面表现良好,并且可以达到与原始MEAP算法相同的精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号