Thrust++: Extending Thrust Framework for Better Abstraction and Performance

机译：Thrust ++：扩展推力框架以获得更好的抽象和性能

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

A good design abstraction framework for high performance computing should provide a higher level programming abstraction that strikes a balance between the abstraction and visibility over the hardware so that the software developer can write a portable software without having to understand the hardware nuances, yet exploit the compute power optimally. In this paper we have analyzed a popular design abstraction framework called "Thrust" from NVIDIA, and proposed an extension called Thrust++ that provides abstraction over the memory hierarchy of an NVIDIA GPU. Thrust++ allows developers to make efficient use of shared memory and overall, provides better control over the GPU memory hierarchy while writing applications in Thrust style for the CUDA backend. We have shown that when applications are written for the CUDA backend using Thrust++, they have minimal performance degradation when compared to their equivalent CUDA versions. Further, Thrust++ provides almost 4x speedup when compared to Thrust, for certain compute intensive kernels that repeatedly use the reduce operation.

机译：高性能计算的良好设计抽象框架应提供更高级别的编程抽象，可在硬件上击中抽象和可见性之间的平衡，以便软件开发人员可以编写便携式软件，而无需了解硬件细微差别，但漏洞利用计算功率最佳。在本文中，我们分析了一个流行的设计抽象框架，称为“来自NVIDIA的”推力“，并提出了一个名为Thrust ++的扩展，该扩展提供了NVIDIA GPU的内存层次结构的抽象。 Thrust ++允许开发人员能够有效地利用共享内存和总体，在为CUDA后端的推力风格编写应用程序时，可以更好地控制GPU存储层次结构。我们已经表明，当使用推力++编写了用于CUDA后端的应用程序时，与其等效的CUDA版本相比，它们具有最小的性能下降。此外，推力与推力相比，推力++提供了几乎4倍的加速，对于一定的计算密集型内核，反复使用缩小操作。

著录项

来源
《IEEE International Conference on High Performance Computing》|2017年|421p|共10页
会议地点
作者
Ajai V. George; Sankar Manoj; Sanket R. Gupte; Sayantan Mitra; Santonu Sarkar;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP301-53;
关键词
Graphics processing units; Hardware; Libraries; Computer architecture; Performance evaluation; Programming; Complexity theory;

机译：图形处理单元;硬件;图书馆;计算机架构;性能评估;编程;复杂性理论;

相似文献

外文文献
中文文献
专利

1. Thrust2D: A new design abstraction framework for structuredrngrid class of algorithms [J] . Santonu Sarkar, Ajai V George, Sankar Manoj Concurrency and computation: practice and experience . 2018,第19期

机译：Thrust2D：针对结构化算法类的新设计抽象框架
2. A framework for high-performance matrix multiplication based on hierarchical abstractions, algorithms and optimized low-level kernels [J] . Vinod Valsalam, Anthony Skjellum Concurrency and Computation . 2002,第10期

机译：基于分层抽象，算法和优化的低级内核的高性能矩阵乘法框架
3. A framework for high-performance matrix multiplication based on hierarchical abstractions, algorithms and optimized low-level kernels [J] . Vinod Valsalam, Anthony Skjellum Concurrency and Computation . 2002,第10期

机译：基于分层抽象，算法和优化的低级内核的高性能矩阵乘法框架
4. Thrust++: Extending Thrust Framework for Better Abstraction and Performance [C] . Ajai V. George, Sankar Manoj, Sanket R. Gupte, 2017 IEEE 24th International Conference on High Performance Computing . 2017

机译：Thrust ++：扩展Thrust框架以获得更好的抽象和性能
5. Extending dynamic invariant detection with explicit abstraction [D] . Keith, Daniel Brian 2012

机译：通过显式抽象扩展动态不变检测
6. Extending i2b2 into a framework for semantic abstraction of EHR to facilitate rapid development and portability of Health IT applications [O] . Kavishwar B. Wagholikar, Layne Ainsworth, Vishal P. Vernekar, 2019

机译：将i2b2扩展到EHR语义抽象的框架中以促进Health IT应用程序的快速开发和可移植性
7. Performance and Optimization Abstractions for Large Scale Heterogeneous Systems in the Cactus/Chemora Framework [O] . Erik Schnetter 2013

机译：Cactus / Chemora框架中大规模异构系统的性能和优化抽象

Thrust++: Extending Thrust Framework for Better Abstraction and Performance

摘要

著录项

相似文献

相关主题

期刊订阅