首页> 外文会议>IEEE International Conference on Big Data Computing Service and Applications >A Distributed Multi-exemplar Affinity Propagation Clustering Algorithm Based on MapReduce
【24h】

A Distributed Multi-exemplar Affinity Propagation Clustering Algorithm Based on MapReduce

机译:基于MapReduce的分布式多示例性关联传播聚类算法

获取原文

摘要

Clustering algorithm is one of the fundamental techniques in data mining, which plays a crucial role in various applications, such as pattern recognition, document retrieval, and computer vision. As so far, many effective algorithms have been proposed. Affinity Propagation is an algorithm requires no parameter indicating the number of clusters, which is the most distinguishing advantage compared to the k-means clustering algorithm. Multi-Exemplar Affinity Propagation (MEAP) extends the single-exemplar model to the multi-exemplar model, which could describe the dataset with more complex structure. With the amount of data increasing rapidly, the growing size of dataset makes the clustering problem become more and more challenging. To solve this problem, the parallel computing framework is widely used, such as MapReduce. However, for the MEAP algorithm, it is not a straightforward task to implement the updating of MEAP messages in MapReduce, which without proper design would be time-consuming. In this paper, we propose to utilize the stability of data distribution to apply the MEAP algorithm on the MapReduce platform and develop an efficient Distributed Multi-Exemplar Affinity Propagation (DisMEAP) clustering algorithm by using three MapReduce stages. The experiment results demonstrate that our algorithm can perform well in processing large-scale datasets and could achieve the same accuracy as the original MEAP algorithm.
机译:聚类算法是数据挖掘中的基本技术之一,它在各种应用中起着至关重要的作用,例如模式识别,文档检索和计算机视觉。到目前为止,已经提出了许多有效的算法。亲和力传播是一种算法,不需要参数,该参数指示与K均值聚类算法相比的最区别的簇数。多示例性关联传播(MEAP)将单个示例模型扩展到多示例模型,这可以描述具有更复杂结构的数据集。随着数据量迅速增加的,数据集的越来越大使聚类问题变得越来越具有挑战性。为了解决这个问题,并行计算框架被广泛使用,例如MapReduce。然而,对于MAP算法,实现MapReduce中的MeAP消息的更新不是一项简单的任务,没有适当的设计将是耗时的。在本文中,我们建议利用数据分布的稳定性在MapReduce平台上应用MAP算法,并通过使用三个MapReduce阶段开发有效的分布式多示例性亲和力传播(DISPEAP)聚类算法。实验结果表明,我们的算法可以在处理大规模数据集中表现良好,并且可以实现与原始MEAP算法相同的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号