GPU-Accelerated Adjoint Algorithmic Differentiation

机译：GPU加速的伴随算法微分

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Many scientific problems such as classifier training or medical image reconstruction can be expressed as minimization of differentiable real-valued cost functions and solved with iterative gradient-based methods. Adjoint algorithmic differentiation (AAD) enables automated computation of gradients of such cost functions implemented as computer programs. To backpropagate adjoint derivatives, excessive memory is potentially required to store the intermediate partial derivatives on a dedicated data structure, referred to as the “tape”. Parallelization is difficult because threads need to synchronize their accesses during taping and backpropagation. This situation is aggravated for many-core architectures, such as Graphics Processing Units (GPUs), because of the large number of light-weight threads and the limited memory size in general as well as per thread. We show how these limitations can be mediated if the cost function is expressed using GPU-accelerated vector and matrix operations which are recognized as intrinsic functions by our AAD software. We compare this approach with naive and vectorized implementations for CPUs. We use four increasingly complex cost functions to evaluate the performance with respect to memory consumption and gradient computation times. Using vectorization, CPU and GPU memory consumption could be substantially reduced compared to the naive reference implementation, in some cases even by an order of complexity. The vectorization allowed usage of optimized parallel libraries during forward and reverse passes which resulted in high speedups for the vectorized CPU version compared to the naive reference implementation. The GPU version achieved an additional speedup of 7.5 ± 4.4, showing that the processing power of GPUs can be utilized for AAD using this concept. Furthermore, we show how this software can be systematically extended for more complex problems such as nonlinear absorption reconstruction for fluorescence-mediated tomography.

机译：诸如分类器训练或医学图像重建之类的许多科学问题可以表示为可区分的实值成本函数的最小化，并可以使用基于迭代梯度的方法来解决。伴随算法微分（AAD）可以自动计算作为计算机程序实现的此类成本函数的梯度。为了反向传播伴随的导数，可能需要大量内存才能将中间的偏导数存储在专用数据结构（称为“磁带”）上。并行化很困难，因为线程需要在分接和反向传播期间同步其访问。对于许多核心体系结构（例如图形处理单元（GPU）），由于大量轻量级线程以及通常以及每个线程有限的内存大小，这种情况更加恶化。我们展示了如果使用GPU加速的矢量和矩阵运算来表达成本函数，这些约束可以被调解，而这些运算被我们的AAD软件识别为固有函数。我们将这种方法与针对CPU的幼稚和矢量化实现进行了比较。我们使用四个越来越复杂的成本函数来评估有关内存消耗和梯度计算时间的性能。与单纯的参考实现相比，使用矢量化可以显着减少CPU和GPU的内存消耗，在某些情况下甚至可以降低一定数量的复杂性。矢量化允许在正向和反向传递过程中使用优化的并行库，与朴素的参考实现相比，矢量化CPU版本的处理速度大大提高。 GPU版本实现了7.5±4.4的额外加速，表明使用此概念，GPU的处理能力可用于AAD。此外，我们展示了如何将该软件系统地扩展为更复杂的问题，例如荧光介导的层析成像的非线性吸收重建。

著录项

期刊名称 other
作者
Felix Gremse; Andreas Höfter; Lukas Razik; Fabian Kiessling; Uwe Naumann;
展开▼
作者单位

展开▼
年(卷),期 -1(200),-1
年度 -1
页码 300–311
总页数 27
原文格式 PDF
正文语种
中图分类
关键词
Adjoint Algorithmic Differentiation GPU Programming;

机译：伴随算法微分;GPU编程;

相似文献

外文文献
中文文献
专利

1. GPU-accelerated adjoint algorithmic differentiation [J] . Gremse Felix, Hoefter Andreas, Razik Lukas, Computer physics communications . 2016,第Null期

机译：GPU加速的伴随算法区分
2. Efficient computation of nonlinear isogeometric elements using the adjoint method and algorithmic differentiation [J] . Oberbichler T., Wuechner R., Bletzinger K-U Computer Methods in Applied Mechanics and Engineering . 2021,第Auga1期

机译：使用伴随方法和算法分化有效地计算非线性异常元素
3. Algorithmic differentiation of an industrial airfoil design tool coupled with the adjoint CFD method [J] . Mladen Banovic, Ilias Vasilopoulos, Andrea Walther, Optimization and Engineering . 2020,第3期

机译：与伴随CFD方法耦合的工业翼型设计工具的算法分化
4. THE DEVELOPMENT AND VERIFICATION OF A FULLY TURBULENT DISCRETE ADJOINT SOLVER USING ALGORITHMIC DIFFERENTIATION [C] . Hangkong Wu, Shenren Xu, Xiuquan Huang, ASME Turbo Expo: Turbomachinery Technical Conference and Exposition . 2021

机译：使用算法分化的完全湍流离散伴随求解器的开发与验证
5. High Order Numerical Methods for Hyperbolic Balance Laws: Well-balanced Discontinuous Galerkin Methods and Adjoint-based Inverse Algorithms [D] . Britton, Jolene A. 2020

机译：双曲余额法的高阶数值方法：均衡的不连续的Galerkin方法和基于伴随的逆算法
6. Fast parallel algorithms for the x-ray transform and its adjoint [O] . Hao Gao -1

机译：X射线变换及其伴随的快速并行算法
7. GPU-accelerated adjoint algorithmic differentiation [O] . Gremse, Felix, Höfter, Andreas, Razik, Lukas, 2016

机译：GPU加速的伴随算法区分

GPU-Accelerated Adjoint Algorithmic Differentiation

摘要

著录项

相似文献

相关主题

期刊订阅