...
首页> 外文期刊>ACM Transactions on Architecture and Code Optimization >A Dynamic Self-Scheduling Scheme for Heterogeneous Multiprocessor Architectures
【24h】

A Dynamic Self-Scheduling Scheme for Heterogeneous Multiprocessor Architectures

机译:异构多处理器体系结构的动态自调度方案

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Today's heterogeneous architectures bring together multiple general-purpose CPUs and multiple domain-specific GPUs and FPGAs to provide dramatic speedup for many applications. However, the challenge lies in utilizing these heterogeneous processors to optimize overall application performance by minimizing workload completion time. Operating system and application development for these systems is in their infancy. In this article, we propose a new scheduling and workload balancing scheme, HDSS, for execution of loops having dependent or independent iterations on heterogeneous multiprocessor systems. The new algorithm dynamically learns the computational power of each processor during an adaptive phase and then schedules the remainder of the workload using a weighted self-scheduling scheme during the completion phase. Different from previous studies, our scheme uniquely considers the runtime effects of block sizes on the performance for heterogeneous multiprocessors. It finds the right trade-off between large and small block sizes to maintain balanced workload while keeping the accelerator utilization at maximum. Our algorithm does not require offline training or architecture-specific parameters. We have evaluated our scheme on two different heterogeneous architectures: AMD 64-core Bulldozer system with nVidia Fermi C2050 GPU and Intel Xeon 32-core SGI Altix 4700 supercomputer with Xilinx Virtex 4 FPGAs. The experimental results show that our new scheduling algorithm can achieve performance improvements up to over 200% when compared to the closest existing load balancing scheme. Our algorithm also achieves full processor utilization with all processors completing at nearly the same time which is significantly better than alternative current approaches.
机译:当今的异构体系结构将多个通用CPU和多个特定于域的GPU和FPGA结合在一起,为许多应用程序提供了惊人的加速。但是,挑战在于利用这些异构处理器来通过最小化工作负载完成时间来优化整体应用程序性能。这些系统的操作系统和应用程序开发尚处于起步阶段。在本文中,我们提出了一种新的调度和工作负载平衡方案HDSS,用于在异构多处理器系统上执行具有相关或独立迭代的循环。新算法在自适应阶段动态学习每个处理器的计算能力,然后在完成阶段使用加权自调度方案调度其余工作负载。与以前的研究不同,我们的方案独特地考虑了块大小的运行时间对异构多处理器性能的影响。它在大块和小块大小之间找到了适当的权衡,以保持平衡的工作量,同时保持最大的加速器利用率。我们的算法不需要脱机训练或特定于体系结构的参数。我们已经在两种不同的异构体系结构上评估了该方案:具有nVidia Fermi C2050 GPU的AMD 64核Bulldozer系统和具有Xilinx Virtex 4 FPGA的英特尔至强32核SGI Altix 4700超级计算机。实验结果表明,与最接近的现有负载平衡方案相比,我们的新调度算法可以将性能提高200%以上。我们的算法还实现了所有处理器几乎同时完成的全部处理器利用率,这比当前的替代方法要好得多。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号