首页> 外文OA文献 >CUDA-for-clusters: a system for efficient execution of CUDA kernels on multi-core clusters
【2h】

CUDA-for-clusters: a system for efficient execution of CUDA kernels on multi-core clusters

机译:CUDA-for Clusters:一种用于在多核群集上高效执行CUDA内核的系统

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Rapid advancements in multi-core processor architectures coupled with low-cost, low-latency, high-bandwidth interconnects have made clusters of multi-core machines a common computing resource. Unfortunately, writing good parallel programs that efficiently utilize all the resources in such a cluster is still a major challenge. Various programming languages have been proposed as a solution to this problem, but are yet to be adopted widely to run performance-critical code mainly due to the relatively immature software framework and the effort involved in re-writing existing code in the new language. In this paper, we motivate and describe our initial study in exploring CUDA as a programming language for a cluster of multi-cores. We develop CUDA-For-Clusters (CFC), a framework that transparently orchestrates execution of CUDA kernels on a cluster of multi-core machines. The well-structured nature of a CUDA kernel, the growing popularity, support and stability of the CUDA software stack collectively make CUDA a good candidate to be considered as a programming language for a cluster. CFC uses a mixture of source-to-source compiler transformations, a work distribution runtime and a light-weight software distributed shared memory to manage parallel executions. Initial results on running several standard CUDA benchmark programs achieve impressive speedups of up to 7.5X on a cluster with 8 nodes, thereby opening up an interesting direction of research for further investigation.
机译:多核处理器体系结构的快速发展,加上低成本,低延迟,高带宽的互连技术,已使多核计算机集群成为通用的计算资源。不幸的是,编写有效利用此类集群中所有资源的良好并行程序仍然是一项重大挑战。已经提出了各种编程语言作为该问题的解决方案,但是主要由于相对不成熟的软件框架以及以新语言重写现有代码所涉及的努力,尚未被广泛采用以运行对性能至关重要的代码。在本文中,我们激励并描述了我们的初步研究,以探索CUDA作为多核集群的编程语言。我们开发了CUDA-For-Cluster(CFC),该框架可透明地协调多核计算机集群上CUDA内核的执行。 CUDA内核结构良好的特性,CUDA软件堆栈的日益普及,支持和稳定性共同使CUDA成为被视为集群编程语言的良好候选者。 CFC混合使用了源到源编译器转换,工作分发运行时和轻量级软件分布式共享内存来管理并行执行。运行多个标准CUDA基准程序的初步结果在具有8个节点的群集上实现了令人印象深刻的7.5倍加速,从而为进一步研究开辟了有趣的研究方向。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号