首页> 外文会议>2010 IEEE International Symposium on Parallel Distributed Processing, Workshops and Phd Forum >AUTO-GC: Automatic translation of data mining applications to GPU clusters
【24h】

AUTO-GC: Automatic translation of data mining applications to GPU clusters

机译:AUTO-GC:将数据挖掘应用程序自动转换为GPU集群

获取原文

摘要

Because of the very favorable price to performance ratio of the GPUs, a popular parallel programming configuration today is a cluster of GPUs. However, extracting performance on such a configuration would typically require programming in both MPI and CUDA, thus requiring a high degree of expertise and effort. It is clearly desirable to be able to support higher-level programming of this emerging high-performance computing platform. This paper reports on a code generation system that can translate data mining applications on a GPU cluster. Our work is driven by the observation that a common processing structure, that of generalized reductions, fits a large number of popular data mining algorithms. In our solution, the programmers simply need to specify the sequential reduction loop(s) with some additional information about the parameters. We use program analysis and code generation to automatically map the applications to the API of FREERIDE, which is a middleware for parallel data mining. We also automatically generate CUDA code for using the GPU on each node of the cluster. We have evaluated our system using two popular data mining applications, k-means clustering and Principal Component Analysis (PCA). We observed good scalability over the number of computing nodes, and the automatically generated version did not have any noticeable overheads compared to hand written codes. The speedup obtained by using GPU over using only the CPU on each node of a cluster is between 3 and 21.
机译:由于GPU的绩效比例非常有利,今天的流行并行编程配置是GPU集群。然而,在这种配置上提取性能通常需要在MPI和CUDA中进行编程,从而需要高度的专业知识和努力。清楚地希望能够支持该新兴的高性能计算平台的更高级别编程。本文报告了一个代码生成系统,可以在GPU集群上翻译数据挖掘应用程序。我们的工作是由观察到的,即广义减少的公共处理结构适合大量流行的数据挖掘算法。在我们的解决方案中,程序员只需用关于参数的一些附加信息指定顺序缩减循环。我们使用程序分析和代码生成来自动将应用程序映射到Freeride的API,这是一个用于并行数据挖掘的中间件。我们还会自动生成CUDA代码,以在群集中的每个节点上使用GPU。我们使用两个流行的数据挖掘应用程序评估了我们的系统,K-Means群集和主成分分析(PCA)。我们观察到计算节点数量的良好可扩展性,并且与手写代码相比,自动生成的版本没有任何明显的开销。通过在群集中的每个节点上仅使用GPU而通过使用GPU获得的加速度在3到21之间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号