首页> 外文会议>Parallel processing for imaging applications >Using a commercial graphical processing unit and the CUDA programming language to accelerate scientific image processing applications
【24h】

Using a commercial graphical processing unit and the CUDA programming language to accelerate scientific image processing applications

机译:使用商业图形处理单元和CUDA编程语言来加速科学图像处理应用程序

获取原文
获取原文并翻译 | 示例

摘要

In the past two years the processing power of video graphics cards has quadrupled and is approaching super computer levels. State-of-the-art graphical processing units (GPU) boast of theoretical computational performance in the range of 1.5 trillion floating point operations per second (1.5 Teraflops). This processing power is readily accessible to the scientific community at a relatively small cost. High level programming languages are now available that give access to the internal architecture of the graphics card allowing greater algorithm optimization. This research takes memory access expensive portions of an image-based iris identification algorithm and hosts it on a GPU using the C++ compatible CUDA language. The selected segmentation algorithm uses basic image processing techniques such as image inversion, value squaring, thresholding, dilation, erosion and memory/computationally intensive calculations such as the circular Hough transform. Portions of the iris segmentation algorithm were accelerated by a factor of 77 over the 2008 GPU results. Some parts of the algorithm ran at speeds that were over 1600 times faster than their CPU counterparts. Strengths and limitations of the GPU Single Instruction Multiple Data architecture are discussed. Memory access times, instruction execution times, programming details and code samples are presented as part of the research.
机译:在过去的两年中,视频图形卡的处理能力翻了两番,并已逼近超级计算机水平。最先进的图形处理单元(GPU)具有每秒1.5万亿个浮点运算(1.5 Teraflops)范围内的理论计算性能。科学界很容易以相对较小的成本获得这种处理能力。现在可以使用高级编程语言,这些语言可以访问图形卡的内部体系结构,从而可以实现更好的算法优化。这项研究采用了基于图像的虹膜识别算法中昂贵的内存访问部分,并将其托管在使用C ++兼容CUDA语言的GPU上。选定的分割算法使用基本的图像处理技术,例如图像反演,值平方,阈值化,膨胀,腐蚀和内存/计算密集型计算(例如循环霍夫变换)。与2008 GPU结果相比,虹膜分割算法的部分速度提高了77倍。该算法的某些部分以比其CPU对应部分快1600倍的速度运行。讨论了GPU单指令多数据架构的优缺点。存储器访问时间,指令执行时间,编程详细信息和代码示例是研究的一部分。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号