Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures

Jang Byunghyun; Schaa Dana; Mistry Perhaad; Kaeli David

首页> 外文期刊>Parallel and Distributed Systems, IEEE Transactions on >Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures

【24h】

Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures

机译：利用内存访问模式来提高数据并行体系结构中的内存性能

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The introduction of General-Purpose computation on GPUs (GPGPUs) has changed the landscape for the future of parallel computing. At the core of this phenomenon are massively multithreaded, data-parallel architectures possessing impressive acceleration ratings, offering low-cost supercomputing together with attractive power budgets. Even given the numerous benefits provided by GPGPUs, there remain a number of barriers that delay wider adoption of these architectures. One major issue is the heterogeneous and distributed nature of the memory subsystem commonly found on data-parallel architectures. Application acceleration is highly dependent on being able to utilize the memory subsystem effectively so that all execution units remain busy. In this paper, we present techniques for enhancing the memory efficiency of applications on data-parallel architectures, based on the analysis and characterization of memory access patterns in loop bodies; we target vectorization via data transformation to benefit vector-based architectures (e.g., AMD GPUs) and algorithmic memory selection for scalar-based architectures (e.g., NVIDIA GPUs). We demonstrate the effectiveness of our proposed methods with kernels from a wide range of benchmark suites. For the benchmark kernels studied, we achieve consistent and significant performance improvements (up to 11.4{times} and 13.5{times} over baseline GPU implementations on each platform, respectively) by applying our proposed methodology.

机译：GPU上通用计算（GPGPU）的引入改变了并行计算的未来前景。这种现象的核心是大规模多线程，数据并行架构，它们具有令人印象深刻的加速等级，可提供低成本的超级计算以及有吸引力的功耗预算。即使考虑到GPGPU提供的众多好处，仍然存在许多阻碍这些架构更广泛采用的障碍。一个主要问题是通常在数据并行体系结构中发现的内存子系统的异构和分布式特性。应用程序加速高度依赖于能否有效利用内存子系统，以便所有执行单元保持繁忙状态。在本文中，我们基于对循环体中内存访问模式的分析和表征，提出了用于提高数据并行架构上应用程序的内存效率的技术；我们的目标是通过数据转换实现矢量化，以使基于矢量的架构（例如AMD GPU）受益，并为基于标量的架构（例如NVIDIA GPU）带来算法内存选择。我们用各种基准套件中的内核展示了我们提出的方法的有效性。对于所研究的基准内核，通过采用我们提出的方法，我们获得了一致且显着的性能改进（分别比每个平台上的基准GPU实施高出11.4 {times}和13.5 {times}）。

著录项

来源
《Parallel and Distributed Systems, IEEE Transactions on》 |2011年第1期|p.105-118|共14页
作者
Jang Byunghyun; Schaa Dana; Mistry Perhaad; Kaeli David;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
GPU computing; General-purpose computation on GPUs (GPGPUs); data parallelism; data-parallel architectures.; memory access pattern; memory coalescing; memory optimization; memory selection; vectorization;

机译：GPU计算;GPU上的通用计算（GPGPU）;数据并行性;数据并行体系结构;内存访问模式;内存合并;内存优化;内存选择;向量化;

相似文献

外文文献
中文文献
专利

1. Improving performance of codes with large/irregular stride memory access patterns via high performance reconfigurable computers [J] . Khalid H. Abed, Gerald R. Morris Journal of Parallel and Distributed Computing . 2013,第11期

机译：通过高性能可重新配置的计算机提高大/不规则步幅内存访问模式的代码性能
2. Exploiting In-Memory Data Patterns for Performance Improvement on Crossbar Resistive Memory [J] . Wen Wen, Zhao Lei, Zhang Youtao, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems . 2020,第10期

机译：利用内存数据模式以进行横杆电阻存储器的性能改进
3. Comparative Evaluation and Case Studies of Shared-Memory and Data-Parallel Execution Patterns [J] . XiaodongZhang, LinSun Scientific programming . 1999,第1期

机译：共享内存和数据并行执行模式的比较评估和案例研究
4. Improving Performance of Codes with Large/Irregular Stride Memory Access Patterns via High Performance Reconfigurable Computers [C] . Abed K.H., Morris G.R. 2009 DoD High Performance Computing Modernization Program Users Group Conference . 2009

机译：通过高性能可重新配置的计算机提高大/不规则步幅内存访问模式的代码性能
5. The NUMA page migration/page replication ASIC {lcub}NPMR{rcub}: A chip design to improve memory system performance in a Non-Uniform Memory Access (NUMA) multiprocessor system architecture. [D] . Kelly, Terence James. 2000

机译：NUMA页面迁移/页面复制ASIC {lcub} NPMR {rcub}：一种芯片设计，用于在非统一内存访问（NUMA）多处理器系统体系结构中提高内存系统性能。
6. Persistent Infection by HSV-1 Is Associated With Changes in Functional Architecture of iPSC-Derived Neurons and Brain Activation Patterns Underlying Working Memory Performance [O] . Leonardo D’Aiuto, Konasale M. Prasad, Catherine H. Upton, 2015

机译：HSV-1的持续感染与iPSC衍生神经元的功能架构变化和工作记忆性能背后的大脑激活模式有关
7. Volume 2, Issue 3, Special issue on Recent Advances in Engineering Systems (Published Papers) Articles Transmit / Received Beamforming for Frequency Diverse Array with Symmetrical frequency offsets Shaddrack Yaw Nusenu Adv. Sci. Technol. Eng. Syst. J. 2(3), 1-6 (2017); View Description Detailed Analysis of Amplitude and Slope Diffraction Coefficients for knife-edge structure in S-UTD-CH Model Eray Arik, Mehmet Baris Tabakcioglu Adv. Sci. Technol. Eng. Syst. J. 2(3), 7-11 (2017); View Description Applications of Case Based Organizational Memory Supported by the PAbMM Architecture Martín, María de los Ángeles, Diván, Mario José Adv. Sci. Technol. Eng. Syst. J. 2(3), 12-23 (2017); View Description Low Probability of Interception Beampattern Using Frequency Diverse Array Antenna Shaddrack Yaw Nusenu Adv. Sci. Technol. Eng. Syst. J. 2(3), 24-29 (2017); View Description Zero Trust Cloud Networks using Transport Access Control and High Availability Optical Bypass Switching Casimer DeCusatis, Piradon Liengtiraphan, Anthony Sager Adv. Sci. Technol. Eng. Syst. J. 2(3), 30-35 (2017); View Description A Derived Metrics as a Measurement to Support Efficient Requirements Analysis and Release Management Indranil Nath Adv. Sci. Technol. Eng. Syst. J. 2(3), 36-40 (2017); View Description Feedback device of temperature sensation for a myoelectric prosthetic hand Yuki Ueda, Chiharu Ishii Adv. Sci. Technol. Eng. Syst. J. 2(3), 41-40 (2017); View Description Deep venous thrombus characterization: ultrasonography, elastography and scattering operator Thibaud Berthomier, Ali Mansour, Luc Bressollette, Frédéric Le Roy, Dominique Mottier Adv. Sci. Technol. Eng. Syst. J. 2(3), 48-59 (2017); View Description Improving customs’ border control by creating a reference database of cargo inspection X-ray images Selina Kolokytha, Alexander Flisch, Thomas Lüthi, Mathieu Plamondon, Adrian Schwaninger, Wicher Vasser, Diana Hardmeier, Marius Costin, Caroline Vienne, Frank Sukowski, Ulf Hassler, Irène Dorion, Najib Gadi, Serge Maitrejean, Abraham Marciano, Andrea Canonica, Eric Rochat, Ger Koomen, Micha Slegt Adv. Sci. Technol. Eng. Syst. J. 2(3), 60-66 (2017); View Description Aviation Navigation with Use of Polarimetric Technologies Arsen Klochan, Ali Al-Ammouri, Viktor Romanenko, Vladimir Tronko Adv. Sci. Technol. Eng. Syst. J. 2(3), 67-72 (2017); View Description Optimization of Multi-standard Transmitter Architecture Using Single-Double Conversion Technique Used for Rescue Operations Riadh Essaadali, Said Aliouane, Chokri Jebali and Ammar Kouki Adv. Sci. Technol. Eng. Syst. J. 2(3), 73-81 (2017); View Description Singular Integral Equations in Electromagnetic Waves Reflection Modeling A. S. Ilinskiy, T. N. Galishnikova Adv. Sci. Technol. Eng. Syst. J. 2(3), 82-87 (2017); View Description Methodology for Management of Information Security in Industrial Control Systems: A Proof of Concept aligned with Enterprise Objectives. Fabian Bustamante, Walter Fuertes, Paul Diaz, Theofilos Toulqueridis Adv. Sci. Technol. Eng. Syst. J. 2(3), 88-99 (2017); View Description Dependence-Based Segmentation Approach for Detecting Morpheme Boundaries Ahmed Khorsi, Abeer Alsheddi Adv. Sci. Technol. Eng. Syst. J. 2(3), 100-110 (2017); View Description Paper Improving Rule Based Stemmers to Solve Some Special Cases of Arabic Language Soufiane Farrah, Hanane El Manssouri, Ziyati Elhoussaine, Mohamed Ouzzif Adv. Sci. Technol. Eng. Syst. J. 2(3), 111-115 (2017); View Description Medical imbalanced data classification Sara Belarouci, Mohammed Amine Chikh Adv. Sci. Technol. Eng. Syst. J. 2(3), 116-124 (2017); View Description ADOxx Modelling Method Conceptualization Environment Nesat Efendioglu, Robert Woitsch, Wilfrid Utz, Damiano Falcioni Adv. Sci. Technol. Eng. Syst. J. 2(3), 125-136 (2017); View Description GPSR+Predict: An Enhancement for GPSR to Make Smart Routing Decision by Anticipating Movement of Vehicles in VANETs Zineb Squalli Houssaini, Imane Zaimi, Mohammed Oumsis, Saïd El Alaoui Ouatik Adv. Sci. Technol. Eng. Syst. J. 2(3), 137-146 (2017); View Description Optimal Synthesis of Universal Space Vector Digital Algorithm for Matrix Converters [O] . Adrian Popovici, Mircea Băbăiţă, Petru Papazian 2017

机译：第2卷，第3卷，工程系统最近进步的特殊问题（已发布论文）文章传输/接收频率各种阵列的波束成形，具有对称频率偏移Shaddrack偏航Nusenu Adv。 SCI。技术。 eng。系统。 J. 2（3），1-6（2017）;查看描述S-UTD-CH模型Eray Arik刀刃结构幅度和坡度衍射系数的详细分析，Mehmet Baris Tabakcioglu Adv。 SCI。技术。 eng。系统。 J. 2（3），7-11（2017）;查看描述案例基于组织内存的案例组织内存由PABMM ArchitectralMartín，MaríadeLosÁngeles，Diván，MarioJoséAven。 SCI。技术。 eng。系统。 J. 2（3），12-23（2017）;查看说明使用频率各种阵列天线Shaddrack偏航Nusenu Adv的低拦截横梁仪表概率。 SCI。技术。 eng。系统。 J. 2（3），24-29（2017）;查看说明零信任云网络使用传输访问控制和高可用性光学旁路交换套管切换西米列德·莱格托希金，安东尼Sager adv。 SCI。技术。 eng。系统。 J. 2（3），30-35（2017）;视图描述派生指标作为支持有效的需求分析和发布管理Indranil Nath ADV的测量。 SCI。技术。 eng。系统。 J. 2（3），36-40（2017）;视图描述肌电假肢yuki ueda的温度感觉反馈装置，恰米·伊莎。 SCI。技术。 eng。系统。 J. 2（3），41-40（2017）;查看描述深静脉血栓表征：超声检查，弹性造影和散射操作员Thibaud Berthomier，Ali Mansour，Luc Bressollette，FrédéricLeRoy，Dominique Mottier Adv。 SCI。技术。 eng。系统。 J. 2（3），48-59（2017）;查看说明通过创建货物检测的参考数据库来改进海关边界控制X射线图像Selina Kolokytha，Alexander Flisch，ThomasLüthi，Mathieu Plamondon，Adrian Schwaninger，Wiana Schwaninger，Wiana Hardmeier，Marius Costin，Caroline Vienne，Frank Sukowski，ULF哈桑德勒，伊瑞恩多森，纳吉·甘迪，塞尔格·马西亚诺，亚伯拉·马西亚诺，安德雷阿索尼卡，埃里克·罗·克，Ger Komen，Micha Slegt Adv。 SCI。技术。 eng。系统。 J. 2（3），60-66（2017）;查看说明航空导航使用偏光技术Arsen Klochan，Ali Al-Ammouri，Viktor Romanenko，Vladimir Tronko Adv。 SCI。技术。 eng。系统。 J. 2（3），67-72（2017）;查看描述使用用于救援运营的单双转换技术优化多标准变送器架构Riadue Essaadali，Chokri Jebali和Ammar Kouki Adv。 SCI。技术。 eng。系统。 J. 2（3），73-81（2017）;视图描述电磁波反射模型中的奇异积分方程A. S.Ilinskiy，T.Galishnikova Adv。 SCI。技术。 eng。系统。 J. 2（3），82-87（2017）;查看工业控制系统信息安全管理的描述方法：概念证明与企业目标对齐。 Fabian Bustamante，Walter Fuertes，Paul Diaz，Theofilos Toulqueridis adv。 SCI。技术。 eng。系统。 J. 2（3），88-99（2017年）;查看描述依赖基于依赖的分割方法，用于检测语素边界Ahmed Khorsi，Abeer Alsheddi Adv。 SCI。技术。 eng。系统。 J. 2（3），100-110（2017）;查看描述纸张改进了基于统治的犹太人，解决了阿拉伯语Soufiane Farrah，Hanane El Manssouri，Ziyati Elhoussaine，Mohamed Ouzzif Adv。 SCI。技术。 eng。系统。 J. 2（3），111-115（2017）;查看描述医疗不平衡数据分类Sara Belarouci，穆罕默德胺Chikh Adv。 SCI。技术。 eng。系统。 J. 2（3），116-124（2017）;查看描述adoxx建模方法概念化环境Nesat Efendioglu，Robert Woitsch，Wilfrid Utz，Damiano Falcioni Adv。 SCI。技术。 eng。系统。 J. 2（3），125-136（2017）;查看描述GPSR +预测：通过预期Vanets Zineb Squalli Houssaini，Imane Zaimi，Mohammed Oumsis，SaïdelAlaouiOuatik Advik Advik Advik Advik Advik Acik Adve，GPSR +预测SCI。技术。 eng。系统。 J. 2（3），137-146（2017）;查看描述矩阵转换器通用空间矢量数字算法的最佳合成
8. Characterization of Random Access Memories: Strategies, Test Patterns and Parameters Involved in the Characterization of RAM Memories and an Evaluation of 16 K RAM's of Eight Manufactures [R] . Jensen, E., Schneider, B. 1979

机译：随机存取存储器的特性：Ram存储器表征中的策略，测试模式和参数以及对8个制造商的16 K Ram的评估

Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅