首页> 外文OA文献 >PARABLE: A PArallel RAndom-partition Based HierarchicaL ClustEring Algorithm for the MapReduce Framework
【2h】

PARABLE: A PArallel RAndom-partition Based HierarchicaL ClustEring Algorithm for the MapReduce Framework

机译:寓言:针对MapReduce框架的基于随机Random分区的层次聚类算法

摘要

Large datasets, of the order of peta- and tera- bytes, are becoming prevalent in many scientific domains including astronomy, physical sciences, bioinformatics and medicine. To effectively store, query and analyze these gigantic repositories, parallel and distributed architectures have become popular. Apache Hadoop is one such framework for supporting data-intensive applications. It provides an open source implementation of the MapReduce programming paradigm which can be used to build scalable algorithms for pattern analysis and data mining. In this paper, we present a PArallel, RAndom-partition Based hierarchical clustEring algorithm (PARABLE) for the MapReduce framework. It proceeds in two main steps -- local hierarchical clustering on nodes using mappers and reducers and integration of results by a novel dendrogram alignment technique. Empirical results on two large data sets (High Energy Particle Physics and Intrusion Detection) from the KDDCup competition on a large cluster indicates that significant scalability benefits can be obtained by using the parallel hierarchical clustering algorithm in addition to maintaining good cluster quality.
机译:PB级和TB级的大型数据集在许多科学领域(包括天文学,物理科学,生物信息学和医学)正变得越来越普遍。为了有效地存储,查询和分析这些巨大的存储库,并行和分布式体系结构已变得很流行。 Apache Hadoop是一种用于支持数据密集型应用程序的框架。它提供了MapReduce编程范例的开源实现,可用于构建用于模式分析和数据挖掘的可扩展算法。在本文中,我们为MapReduce框架提出了一种基于Rardom-Random-partition的分层集群聚类算法(PARABLE)。它分两个主要步骤进行:使用映射器和化简器在节点上进行局部层次聚类,以及通过新颖的树状图对齐技术对结果进行整合。来自大型集群上的KDDCup竞争的两个大型数据集(高能粒子物理和入侵检测)的经验结果表明,除了保持良好的集群质量之外,使用并行分层聚类算法还可以获得显着的可伸缩性。

著录项

  • 作者

    Wang Shen; Dutta Haimonti;

  • 作者单位
  • 年度 2011
  • 总页数
  • 原文格式 PDF
  • 正文语种 {"code":"en","name":"English","id":9}
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号