首页> 外文期刊>International Journal of Image, Graphics and Signal Processing >Performance Framework for HPC Applications on Homogeneous Computing Platform
【24h】

Performance Framework for HPC Applications on Homogeneous Computing Platform

机译:异构计算平台上HPC应用程序的性能框架

获取原文
获取外文期刊封面目录资料

摘要

In scientific fields, solving large and complex computational problems using central processing units (CPU) alone is not enough to meet the computation requirement. In this work we have considered a homogenous cluster in which each nodes consists of same capability of CPU and graphical processing unit (GPU). Normally CPU are used for control GPU and to transfer data from CPU to GPUs. Here we are considering CPU computation power with GPU to compute high performance computing (HPC) applications. The framework adopts pinned memory technique to overcome the overhead of data transfer between CPU and GPU. To enable the homogeneous platform we have considered hybrid [message passing interface (MPI), OpenMP (open multi-processing), Compute Unified Device Architecture (CUDA)] programming model strategy. The key challenge on the homogeneous platform is allocation of workload among CPU and GPU cores. To address this challenge we have proposed a novel analytical workload division strategy to predict an effective workload division between the CPU and GPU. We have observed that using our hybrid programming model and workload division strategy, an average performance improvement of 76.06% and 84.11% in Giga floating point operations per seconds(GFLOPs) on NVIDIA TESLA M2075 cluster and NVIDIA QUADRO K 2000 nodes of a cluster respectively for N-dynamic vector addition when compared with Simplice Donfack et.al [5] performance models. Also using pinned memory technique with hybrid programming model an average performance improvement of 33.83% and 39.00% on NVIDIA TESLA M2075 and NVIDIA QUADRO K 2000 respectively is observed for saxpy applications when compared with pagable memory technique.
机译:在科学领域,仅使用中央处理器(CPU)解决大型和复杂的计算问题还不足以满足计算要求。在这项工作中,我们考虑了同质集群,其中每个节点都具有相同的CPU和图形处理单元(GPU)功能。通常,CPU用于控制GPU并将数据从CPU传输到GPU。在这里,我们考虑使用GPU的CPU计算能力来计算高性能计算(HPC)应用程序。该框架采用固定内存技术来克服CPU和GPU之间的数据传输开销。为了启用同类平台,我们考虑了混合[消息传递接口(MPI),OpenMP(开放式多处理),计算统一设备体系结构(CUDA)]编程模型策略。同类平台上的主要挑战是在CPU和GPU内核之间分配工作负载。为了应对这一挑战,我们提出了一种新颖的分析工作负载分配策略,以预测CPU和GPU之间的有效工作负载分配。我们已经观察到,使用我们的混合编程模型和工作负载划分策略,分别在NVIDIA TESLA M2075群集和NVIDIA QUADRO K 2000节点上,每秒Giga浮点操作(GFLOP)的平均性能分别提高了76.06%和84.11%。与Simplice Donfack等[5]性能模型相比,N动态矢量加法。与固定内存技术相比,对于saxpy应用程序,还发现在混合编程模型中使用固定内存技术的NVIDIA TESLA M2075和NVIDIA QUADRO K 2000的平均性能分别提高了33.83%和39.00%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号