Performance Modeling of Atomic Additions on GPU Scratchpad Memory

Gomez-Luna Juan; Gonzalez-Linares Jose Maria; Benavides Benitez Jose Ignacio; Guil Mata Nicolas

首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Performance Modeling of Atomic Additions on GPU Scratchpad Memory

【24h】

Performance Modeling of Atomic Additions on GPU Scratchpad Memory

机译：GPU Scratchpad内存上原子添加的性能建模

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

GPU application implementations using scatter approaches will fall into write contention due to atomic updates of output elements, if these result from more than one input element. Colliding threads will be serialized, seriously harming performance. Dealing with these issues requires a proper understanding of the behavior of the scratchpad or shared memory under conflicting accesses caused by concurrent threads. Thus, this paper presents an exhaustive microbenchmark-based analysis of atomic additions in shared memory that quantifies the impact of access conflicts on latency and throughput. This analysis has led us to discover the lock mechanism that enables atomic updates to shared memory and to propose a performance model to estimate the latency penalties due to collisions by position or bank conflicts. Then, we have derived experiments from this model that show us the way to optimize applications using atomic operations. Position and bank conflicts can be diminished by replication and padding, respectively. The benefits of such techniques are illustrated with the optimization of two widely used voting processes: the centroid updating step in k-means clustering, and histogram calculation.

机译：如果使用分散方法的GPU应用程序实现是由输出元素的原子更新引起的，则写入争用将归结为写入争用，如果这些更新来自多个输入元素。冲突线程将被序列化，从而严重损害性能。处理这些问题需要对并发线程导致的冲突访问下的暂存器或共享内存的行为有适当的了解。因此，本文对共享内存中的原子添加进行了详尽的基于微基准的分析，该分析量化了访问冲突对延迟和吞吐量的影响。通过这种分析，我们发现了一种锁定机制，可以对共享内存进行原子更新，并提出一种性能模型来估计由于位置冲突或库冲突而导致的延迟损失。然后，我们从该模型中获得了实验，这些实验向我们展示了使用原子操作优化应用程序的方法。位置和库冲突可以分别通过复制和填充来减少。通过优化两个广泛使用的投票过程来说明此类技术的好处：k均值聚类中的质心更新步骤以及直方图计算。

著录项

来源
《IEEE Transactions on Parallel and Distributed Systems》 |2013年第11期|2273-2282|共10页
作者
Gomez-Luna Juan; Gonzalez-Linares Jose Maria; Benavides Benitez Jose Ignacio; Guil Mata Nicolas;
展开▼
作者单位

University of Córdoba, Córdoba|c|;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
CUDA; GPU; K-means; Performance model; atomic operations; histogram; shared memory;

机译：CUDA;GPU;K-means;性能模型;原子操作;直方图;共享内存;

相似文献

外文文献
中文文献
专利

1. Configurable XOR Hash Functions for Banked Scratchpad Memories in GPUs [J] . Gert-Jan van den Braak, Juan Gómez-Luna, José María González-Linares, IEEE Transactions on Computers . 2016,第7期

机译：可配置的XOR散列函数，用于GPU中的银行暂存器内存
2. Lightweight Hardware Transactional Memory for GPU Scratchpad Memory [J] . Villegas Alejandro, Asenjo Rafael, Navarro Angeles, Fortschritte der Physik . 2018,第6期

机译：用于GPU ScratchPad内存的轻量级硬件事务内存
3. An access pattern based adaptive mapping function for GPGPU scratchpad memory [J] . Feng Han, Li Li, Kun Wang, IEICE Electronics Express . 2017,第12期

机译：基于访问模式的GPGPU暂存器自适应映射功能
4. Simulation and architecture improvements of atomic operations on GPU scratchpad memory [C] . van den Braak Gert-Jan, Gomez-Luna Juan, Corporaal Henk, 2013 IEEE 31st International Conference on Computer Design . 2013

机译：GPU暂存器内存上原子操作的仿真和体系结构改进
5. Local memory store (LMStr): A scratchpad memory for high performance computing. [D] . Siddique, Nafiul Alam. 2016

机译：本地内存存储（LMStr）：用于高性能计算的暂存器。
6. Low Cost High Performance GPU Computing Solution for Atomic Resolution CryoEM Single-Particle Reconstruction [O] . Xiaokang Zhang, Xing Zhang, Z. Hong Zhou -1

机译：低成本高性能GPU计算解决方案用于原子分辨率冷冻单粒子重建
7. A Memory Optimization Technique for Software- Managed Scratchpad Memory in GPUs [O] . Maryam Moazeni, Alex Bui, Majid Sarrafzadeh 2010

机译：GpU中软件管理scratchpad内存的内存优化技术
8. Scratchpad Memory Allocation Scheme for Dataflow Models [R] . Bandyopadhyay, S., Feng, T. H., Patel, H. D., 2008

机译：数据流模型的scratchpad内存分配方案

Performance Modeling of Atomic Additions on GPU Scratchpad Memory

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅