首页> 外文会议>IEEE International Symposium on Parallel Distributed Processing >Large-scale multi-dimensional document clustering on GPU clusters
【24h】

Large-scale multi-dimensional document clustering on GPU clusters

机译:GPU集群上的大规模多维文档集群

获取原文

摘要

Document clustering plays an important role in data mining systems. Recently, a flocking-based document clustering algorithm has been proposed to solve the problem through simulation resembling the flocking behavior of birds in nature. This method is superior to other clustering algorithms, including k-means, in the sense that the outcome is not sensitive to the initial state. One limitation of this approach is that the algorithmic complexity is inherently quadratic in the number of documents. As a result, execution time becomes a bottleneck with large number of documents. In this paper, we assess the benefits of exploiting the computational power of Beowulf-like clusters equipped with contemporary Graphics Processing Units (GPUs) as a means to significantly reduce the runtime of flocking-based document clustering. Our framework scales up to over one million documents processed simultaneously in a sixteen-node moderate GPU cluster. Results are also compared to a four-node cluster with higher-end GPUs. On these clusters, we observe 30X-50X speedups, which demonstrate the potential of GPU clusters to efficiently solve massive data mining problems. Such speedups combined with the scalability potential and accelerator-based parallelization are unique in the domain of document-based data mining, to the best of our knowledge.
机译:文档群集在数据挖掘系统中扮演着重要作用。最近,已经提出了一种基于植入的文档聚类算法来解决问题,通过类似于鸟类的蜂拥而至的鸟类本质上的蜂拥而至。该方法优于其他聚类算法,包括K-Means,从而意识到结果对初始状态不敏感。这种方法的一个限制是算法复杂性在文档的数量中具有自然的二次。结果,执行时间成为具有大量文档的瓶颈。在本文中,我们评估了利用配备有当代图形处理单元(GPU)的Beowulf样集群的计算能力的好处,作为显着减少基于植入的文档聚类的运行计划的方法。我们的框架在十六个节点中等GPU集群中同时处理多达超过一百万个文件。结果也将与具有高端GPU的四节点群集进行比较。在这些集群上,我们观察到30x-50倍的加速,这证明了GPU集群有效解决了大规模数据挖掘问题的潜力。这种加速与可伸缩性电位和基于加速器的并行化相结合,在基于文档的数据挖掘领域中是独一无二的,据我们所知,我们的知识域名。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号