首页> 外文OA文献 >An Application-Oriented Approach for Accelerating Data-Parallel Computation with Graphics Processing Unit

【2h】

An Application-Oriented Approach for Accelerating Data-Parallel Computation with Graphics Processing Unit

机译：面向应用程序的图形处理单元加速数据并行计算的方法

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper presents a novel parallelization and quantitative characterization of various optimization strategies for data-parallel computation on a graphics processing unit (GPU) using NVIDIA's new GPU programming framework, Compute Unified Device Architecture (CUDA). CUDA is an easy-to-use development framework that has drawn the attention of many different application areas looking for dramatic speed-ups in their code. However, the performance tradeoffs in CUDA are not yet fully understood, especially for data-parallel applications. Consequently, we study two fundamental mathematical operations that are common in many data-parallel applications: convolution and accumulation. Specifically, we profile and optimize the performance of these operations on a 128-core NVIDIA GPU. We then characterize the impact of these operations on a video-based motion-tracking algorithm called vector coherence mapping, which consists of a series of convolutions and dynamically weighted accumulations, and present a comparison of different implementations and their respective performance profiles.

机译：本文介绍了使用NVIDIA新的GPU编程框架Compute Unified Device Architecture（CUDA）在图形处理单元（GPU）上进行数据并行计算的各种优化策略的新颖并行化和量化特性。 CUDA是一个易于使用的开发框架，已吸引了许多寻求在其代码中大幅度提高速度的不同应用程序领域的关注。但是，尚未完全了解CUDA中的性能折衷，尤其是对于数据并行应用程序。因此，我们研究了许多数据并行应用程序中常见的两个基本数学运算：卷积和累加。具体来说，我们在128核NVIDIA GPU上分析并优化了这些操作的性能。然后，我们描述了这些操作对称为矢量相干映射的基于视频的运动跟踪算法的影响，该算法由一系列卷积和动态加权累加组成，并给出了不同实现及其各自性能概况的比较。

著录项

作者
Ponce Sean; Jing Huang; Park Seung In; Khoury Chase; Quek Francis; Cao Yong;
展开▼
作者单位

展开▼
年度 2009
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. Accelerating the RTTOV-7 IASI and AMSU-A radiative transfer models on graphics processing units: evaluating central processing unit/graphics processing unit-hybrid and pure-graphics processing unit approaches [J] . Jarno Mielikainen, Bormin Huang, Hung-Lung Allen Huang, Journal of Applied Remote Sensing . 2011,第Null期

机译：加速图形处理单元上的RTTOV-7 IASI和AMSU-A辐射传递模型：评估中央处理单元/图形处理单元-混合和纯图形处理单元方法
2. Accelerating bioinspired lateral interaction in accumulative computation for real-time moving object detection with graphics processing units [J] . Sanchez Jose L., Lopez Maria T., Manuel Pastor Jose, Natural Computing . 2019,第2期

机译：通过图形处理单元在累积计算中加速受生物启发的横向交互，以实现实时移动物体检测
3. Accelerated multidimensional radiofrequency pulse design for parallel transmission using concurrent computation on multiple graphics processing units [J] . Weiran Deng, Weiran Deng Magnetic Resonance in Medicine . 2011,第2期

机译：在多个图形处理单元上使用并发计算的并行传输加速多维射频脉冲设计
4. Accelerating the Near Non-bonded Force Computation in Desmond with Graphic Processing Units [C] . Deng Hualiang, Li Xin, Liu Xiaoguang, 2011 40th International Conference on Parallel Processing Workshops . 2011

机译：利用图形处理单元加速Desmond中近乎无粘结的力计算
5. Accelerating scientific computation in bioinformatics by using graphics processing units as parallel vector processors. [D] . Payne, Bryson R. 2005

机译：通过将图形处理单元用作并行向量处理器，加快生物信息学的科学计算。
6. Accelerated Multi-Dimensional RF Pulse Design for Parallel Transmission Using Concurrent Computation on Multiple Graphics Processing Units [O] . Weiran Deng, Cungeng Yang, V. Andrew Stenger -1

机译：加速多维RF脉冲设计并行传输的多个图形处理单元使用并发计算
7. Accelerating the computation for real-time application of the sinc function using graphics processing units [O] . Sangwoo Kim, Chulhyun Lee 2020

机译：加速使用图形处理单元实时应用真正应用的计算
8. Accelerating Line of Sight Computation Using Graphics Processing Units [R] . Salomon, B. , Govindaraju, N. , Sud, A. , 2004

机译：使用图形处理单元加速视线计算

An Application-Oriented Approach for Accelerating Data-Parallel Computation with Graphics Processing Unit

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅