首页> 外文OA文献 >An Application-Oriented Approach for Accelerating Data-Parallel Computation with Graphics Processing Unit
【2h】

An Application-Oriented Approach for Accelerating Data-Parallel Computation with Graphics Processing Unit

机译:面向应用程序的图形处理单元加速数据并行计算的方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

This paper presents a novel parallelization and quantitative characterization of various optimization strategies for data-parallel computation on a graphics processing unit (GPU) using NVIDIA's new GPU programming framework, Compute Unified Device Architecture (CUDA). CUDA is an easy-to-use development framework that has drawn the attention of many different application areas looking for dramatic speed-ups in their code. However, the performance tradeoffs in CUDA are not yet fully understood, especially for data-parallel applications. Consequently, we study two fundamental mathematical operations that are common in many data-parallel applications: convolution and accumulation. Specifically, we profile and optimize the performance of these operations on a 128-core NVIDIA GPU. We then characterize the impact of these operations on a video-based motion-tracking algorithm called vector coherence mapping, which consists of a series of convolutions and dynamically weighted accumulations, and present a comparison of different implementations and their respective performance profiles.
机译:本文介绍了使用NVIDIA新的GPU编程框架Compute Unified Device Architecture(CUDA)在图形处理单元(GPU)上进行数据并行计算的各种优化策略的新颖并行化和量化特性。 CUDA是一个易于使用的开发框架,已吸引了许多寻求在其代码中大幅度提高速度的不同应用程序领域的关注。但是,尚未完全了解CUDA中的性能折衷,尤其是对于数据并行应用程序。因此,我们研究了许多数据并行应用程序中常见的两个基本数学运算:卷积和累加。具体来说,我们在128核NVIDIA GPU上分析并优化了这些操作的性能。然后,我们描述了这些操作对称为矢量相干映射的基于视频的运动跟踪算法的影响,该算法由一系列卷积和动态加权累加组成,并给出了不同实现及其各自性能概况的比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号