首页> 美国卫生研究院文献>other >GPU-Accelerated Adjoint Algorithmic Differentiation
【2h】

GPU-Accelerated Adjoint Algorithmic Differentiation

机译:GPU加速的伴随算法微分

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Many scientific problems such as classifier training or medical image reconstruction can be expressed as minimization of differentiable real-valued cost functions and solved with iterative gradient-based methods. Adjoint algorithmic differentiation (AAD) enables automated computation of gradients of such cost functions implemented as computer programs. To backpropagate adjoint derivatives, excessive memory is potentially required to store the intermediate partial derivatives on a dedicated data structure, referred to as the “tape”. Parallelization is difficult because threads need to synchronize their accesses during taping and backpropagation. This situation is aggravated for many-core architectures, such as Graphics Processing Units (GPUs), because of the large number of light-weight threads and the limited memory size in general as well as per thread. We show how these limitations can be mediated if the cost function is expressed using GPU-accelerated vector and matrix operations which are recognized as intrinsic functions by our AAD software. We compare this approach with naive and vectorized implementations for CPUs. We use four increasingly complex cost functions to evaluate the performance with respect to memory consumption and gradient computation times. Using vectorization, CPU and GPU memory consumption could be substantially reduced compared to the naive reference implementation, in some cases even by an order of complexity. The vectorization allowed usage of optimized parallel libraries during forward and reverse passes which resulted in high speedups for the vectorized CPU version compared to the naive reference implementation. The GPU version achieved an additional speedup of 7.5 ± 4.4, showing that the processing power of GPUs can be utilized for AAD using this concept. Furthermore, we show how this software can be systematically extended for more complex problems such as nonlinear absorption reconstruction for fluorescence-mediated tomography.
机译:诸如分类器训练或医学图像重建之类的许多科学问题可以表示为可区分的实值成本函数的最小化,并可以使用基于迭代梯度的方法来解决。伴随算法微分(AAD)可以自动计算作为计算机程序实现的此类成本函数的梯度。为了反向传播伴随的导数,可能需要大量内存才能将中间的偏导数存储在专用数据结构(称为“磁带”)上。并行化很困难,因为线程需要在分接和反向传播期间同步其访问。对于许多核心体系结构(例如图形处理单元(GPU)),由于大量轻量级线程以及通常以及每个线程有限的内存大小,这种情况更加恶化。我们展示了如果使用GPU加速的矢量和矩阵运算来表达成本函数,这些约束可以被调解,而这些运算被我们的AAD软件识别为固有函数。我们将这种方法与针对CPU的幼稚和矢量化实现进行了比较。我们使用四个越来越复杂的成本函数来评估有关内存消耗和梯度计算时间的性能。与单纯的参考实现相比,使用矢量化可以显着减少CPU和GPU的内存消耗,在某些情况下甚至可以降低一定数量的复杂性。矢量化允许在正向和反向传递过程中使用优化的并行库,与朴素的参考实现相比,矢量化CPU版本的处理速度大大提高。 GPU版本实现了7.5±4.4的额外加速,表明使用此概念,GPU的处理能力可用于AAD。此外,我们展示了如何将该软件系统地扩展为更复杂的问题,例如荧光介导的层析成像的非线性吸收重建。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号