CUDA Memory Optimizations for Large Data-Structures in the Gravit Simulator

机译：Gravit Simulator中针对大型数据结构的CUDA内存优化

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Modern GPUs open a completely new field to optimize embarrassingly parallel algorithms. Implementing an algorithm on a GPU confronts the programmer with a new set of challenges for program optimization. Some of the most notable ones are isolating the part of the algorithm that can be optimized to run on the GPU; tuning the program for the GPU memory hierarchy whose organization and performance implications are radically different from those of general purpose CPUs; and optimizing programs at the instruction-level for the GPU. This paper makes two contributions to the performance optimizations for GPUs. We analyze different approaches for optimizing the memory usage and access patterns for GPUs and propose a class of memory layout optimizations that can take full advantage of the unique memory hierarchy of NVIDIA CUDA. Furthermore, we analyze the performance increase by fully unrolling the innermost loop of the algorithm and propose guidelines on how to best unroll a program for the GPU. In particular, even that loop unrolling is a common optimization, the performance improvement on a GPU derives from a completely different aspect of this architecture. To demonstrate these optimizations, we picked an embarrassingly parallel algorithm used to calculate gravitational forces. This algorithm allows us to demonstrate and to explain the performance increase gained by the applied optimizations. Our results show that our approach is quite effective. After applying our technique to the algorithm used in the Gravit gravity simulator, we observed a 1.27x speedup compared to the baseline GPU implementation. This represents a 87x speedup to the original CPU implementation.

机译：现代GPU开辟了一个全新的领域，以优化令人尴尬的并行算法。在GPU上实现算法使程序员面临着程序优化方面的新挑战。一些最引人注目的是隔离可以优化以在GPU上运行的算法部分；针对GPU内存层次结构调整程序，其组织和性能影响与通用CPU根本不同；在GPU的指令级上优化程序。本文为GPU的性能优化做出了两点贡献。我们分析了用于优化GPU的内存使用和访问模式的不同方法，并提出了一类内存布局优化方法，可以充分利用NVIDIA CUDA的独特内存层次结构。此外，我们通过完全展开算法的最内层循环来分析性能提升，并提出有关如何最佳地展开GPU程序的指南。特别是，即使循环展开是一个常见的优化，GPU的性能改进也源于该体系结构的完全不同的方面。为了演示这些优化，我们选择了一种令人尴尬的并行算法来计算重力。该算法使我们能够演示和解释通过应用优化获得的性能提升。我们的结果表明，我们的方法非常有效。将我们的技术应用于Gravit重力模拟器中使用的算法后，与基准GPU实施相比，我们观察到了1.27倍的加速。这表示原始CPU实现的速度提高了87倍。

著录项

来源
《Parallel Processing Workshops, 2009. ICPPW '09》|2009年|174-181|共8页
会议地点 Vienna(AT);Vienna(AT)
作者
Siegel Jakob; Ributzka Juergen; Li Xiaoming;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
CUDA; GPGPU; memory layout; n-body; optimization;

机译：CUDA; GPGPU;内存布局; n-body;优化;

相似文献

外文文献
中文文献
专利

1. CUDA Memory Optimizations for Large Data-Structures in the Gravit Simulator [J] . Jakob Siegel, Juergen Ributzka, Xiaoming Li Journal of algorithms & computational technology . 2011,第2期

机译：Gravit Simulator中针对大型数据结构的CUDA内存优化
2. CUDA optimization strategies for compute- and memory-bound neuroimaging algorithms [J] . LeeD., DinovI., DongB., Computer Methods and Programs in Biomedicine: An International Journal Devoted to the Development, Implementation and Exchange of Computing Methodology and Software Systems in Biomedical Research and Medical Practice . 2012,第3期

机译：用于计算和内存绑定神经成像算法的CUDA优化策略
3. Memory Access Optimization in Recurrent Image Processing Algorithms with CUDA [J] . S. A. Bibikov, V. A. Fursov, A. V. Nikonorov, Pattern recognition and image analysis: advances in mathematical theory and applications in the USSR . 2011,第3期

机译：CUDA的循环图像处理算法中的内存访问优化
4. CUDA Memory Optimizations for Large Data-Structures in the Gravit Simulator [C] . International Conference on Parallel Processing . 2009

机译：CUDA内存优化Gravit Simulator中的大数据结构
5. CUDA-CHiLL: A programming language interface for GPGPU optimizations and code generation. [D] . Rudy, Gabe. 2010

机译：CUDA-CHiLL：一种用于GPGPU优化和代码生成的编程语言界面。
6. CUDA Optimization Strategies for Compute- and Memory-Bound Neuroimaging Algorithms [O] . Daren Lee, Ivo Dinov, Bin Dong, -1

机译：CUDA优化策略用于计算和内存内存的神经影像算法
7. CUDA optimization strategies for compute- and memory-bound neuroimaging algorithms [O] . Daren Lee, Ivo Dinov, Bin Dong, 2012

机译：CUDA优化策略，用于计算和内存内存的神经影像算法

CUDA Memory Optimizations for Large Data-Structures in the Gravit Simulator

摘要

著录项

相似文献

相关主题

期刊订阅