首页> 外文会议>Euromicro International Conference on Parallel, Distributed and Network-Based Processing >Clustering Goes Big: CLUBS-P, an Algorithm for Unsupervised Clustering Around Centroids Tailored For Big Data Applications
【24h】

Clustering Goes Big: CLUBS-P, an Algorithm for Unsupervised Clustering Around Centroids Tailored For Big Data Applications

机译:聚类变得更大:CLUBS-P,一种针对大数据应用量身定制的围绕质心的无监督聚类算法

获取原文

摘要

The need to support advanced analytics on Big Data is driving data scientist' interest toward massively parallel distributed systems and software platforms, such as Map- Reduce and Spark, that make possible their scalable utilization. However, when complex data mining algorithms are required, their fully scalable deployment on such platforms faces a number of technical challenges that grow with the complexity of the algorithms involved. Thus algorithms, that were originally designed for a sequential nature, must often be redesigned in order to effectively use the distributed computational resources. In this paper, we explore these problems, and then propose a solution which has proven to be very effective on the complex hierarchical clustering algorithm CLUBS+. We present a parallel version of CLUBS+ named CLUBS-P with an ad-hoc implementation based on message passing: CLUBS-MP.
机译:支持对大数据进行高级分析的需求正在推动数据科学家对大规模并行的分布式系统和软件平台(例如Map-Reduce和Spark)的兴趣,从而使其可扩展利用成为可能。但是,当需要复杂的数据挖掘算法时,它们在此类平台上的完全可扩展部署面临着许多技术挑战,这些挑战随着所涉及算法的复杂性而增长。因此,最初为顺序性质设计的算法必须经常进行重新设计,以便有效地使用分布式计算资源。在本文中,我们探讨了这些问题,然后提出了一种已证明对复杂的层次聚类算法CLUBS +非常有效的解决方案。我们提出了一个名为CLUBS-P的并行版本的CLUBS +,并具有基于消息传递的即席实现:CLUBS-MP。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号