首页> 外文期刊>Parallel and Distributed Systems, IEEE Transactions on >Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures
【24h】

Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures

机译:利用内存访问模式来提高数据并行体系结构中的内存性能

获取原文
获取原文并翻译 | 示例

摘要

The introduction of General-Purpose computation on GPUs (GPGPUs) has changed the landscape for the future of parallel computing. At the core of this phenomenon are massively multithreaded, data-parallel architectures possessing impressive acceleration ratings, offering low-cost supercomputing together with attractive power budgets. Even given the numerous benefits provided by GPGPUs, there remain a number of barriers that delay wider adoption of these architectures. One major issue is the heterogeneous and distributed nature of the memory subsystem commonly found on data-parallel architectures. Application acceleration is highly dependent on being able to utilize the memory subsystem effectively so that all execution units remain busy. In this paper, we present techniques for enhancing the memory efficiency of applications on data-parallel architectures, based on the analysis and characterization of memory access patterns in loop bodies; we target vectorization via data transformation to benefit vector-based architectures (e.g., AMD GPUs) and algorithmic memory selection for scalar-based architectures (e.g., NVIDIA GPUs). We demonstrate the effectiveness of our proposed methods with kernels from a wide range of benchmark suites. For the benchmark kernels studied, we achieve consistent and significant performance improvements (up to 11.4{times} and 13.5{times} over baseline GPU implementations on each platform, respectively) by applying our proposed methodology.
机译:GPU上通用计算(GPGPU)的引入改变了并行计算的未来前景。这种现象的核心是大规模多线程,数据并行架构,它们具有令人印象深刻的加速等级,可提供低成本的超级计算以及有吸引力的功耗预算。即使考虑到GPGPU提供的众多好处,仍然存在许多阻碍这些架构更广泛采用的障碍。一个主要问题是通常在数据并行体系结构中发现的内存子系统的异构和分布式特性。应用程序加速高度依赖于能否有效利用内存子系统,以便所有执行单元保持繁忙状态。在本文中,我们基于对循环体中内存访问模式的分析和表征,提出了用于提高数据并行架构上应用程序的内存效率的技术;我们的目标是通过数据转换实现矢量化,以使基于矢量的架构(例如AMD GPU)受益,并为基于标量的架构(例如NVIDIA GPU)带来算法内存选择。我们用各种基准套件中的内核展示了我们提出的方法的有效性。对于所研究的基准内核,通过采用我们提出的方法,我们获得了一致且显着的性能改进(分别比每个平台上的基准GPU实施高出11.4 {times}和13.5 {times})。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号