首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Performance Modeling of Atomic Additions on GPU Scratchpad Memory
【24h】

Performance Modeling of Atomic Additions on GPU Scratchpad Memory

机译:GPU Scratchpad内存上原子添加的性能建模

获取原文
获取原文并翻译 | 示例

摘要

GPU application implementations using scatter approaches will fall into write contention due to atomic updates of output elements, if these result from more than one input element. Colliding threads will be serialized, seriously harming performance. Dealing with these issues requires a proper understanding of the behavior of the scratchpad or shared memory under conflicting accesses caused by concurrent threads. Thus, this paper presents an exhaustive microbenchmark-based analysis of atomic additions in shared memory that quantifies the impact of access conflicts on latency and throughput. This analysis has led us to discover the lock mechanism that enables atomic updates to shared memory and to propose a performance model to estimate the latency penalties due to collisions by position or bank conflicts. Then, we have derived experiments from this model that show us the way to optimize applications using atomic operations. Position and bank conflicts can be diminished by replication and padding, respectively. The benefits of such techniques are illustrated with the optimization of two widely used voting processes: the centroid updating step in k-means clustering, and histogram calculation.
机译:如果使用分散方法的GPU应用程序实现是由输出元素的原子更新引起的,则写入争用将归结为写入争用,如果这些更新来自多个输入元素。冲突线程将被序列化,从而严重损害性能。处理这些问题需要对并发线程导致的冲突访问下的暂存器或共享内存的行为有适当的了解。因此,本文对共享内存中的原子添加进行了详尽的基于微基准的分析,该分析量化了访问冲突对延迟和吞吐量的影响。通过这种分析,我们发现了一种锁定机制,可以对共享内存进行原子更新,并提出一种性能模型来估计由于位置冲突或库冲突而导致的延迟损失。然后,我们从该模型中获得了实验,这些实验向我们展示了使用原子操作优化应用程序的方法。位置和库冲突可以分别通过复制和填充来减少。通过优化两个广泛使用的投票过程来说明此类技术的好处:k均值聚类中的质心更新步骤以及直方图计算。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号