【24h】

Optimal memory-aware backpropagation of deep join networks

机译:最佳内存感知深加入网络的逆产

获取原文
获取原文并翻译 | 示例
           

摘要

Deep learning training memory needs can prevent the user from considering large models and large batch sizes. In this work, we propose to use techniques from memory-aware scheduling and automatic differentiation (AD) to execute a backpropagation graph with a bounded memory requirement at the cost of extra recomputations. The case of a single homogeneous chain, i.e. the case of a network whose stages are all identical and form a chain, is well understood and optimal solutions have been proposed in the AD literature. The networks encountered in practice in the context of deep learning are much more diverse, both in terms of shape and heterogeneity. In this work, we define the class of backpropagation graphs, and extend those on which one can compute in polynomial time a solution that minimizes the total number of recomputations. In particular, we consider join graphs which correspond to models such as siamese or cross-modal networks. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'.
机译:深度学习培训记忆需求可以防止用户考虑大型型号和大批量尺寸。在这项工作中,我们建议使用来自内存感知的调度和自动差异化(AD)的技术,以执行额外重新计算成本的有界存储器要求的反向译。单一均链的情况,即阶段全部相同并形成链的网络的情况,很好地理解,并且在广告文献中提出了最佳解决方案。在深入学习的背景下在实践中遇到的网络在形状和异质性方面都是多样化的。在这项工作中,我们定义了BackPropagation图表的类,并扩展了一个可以在多项式时间中计算的类,该解决方案最小化重新计算总数。特别是,我们考虑连接图,该图对应于暹罗或跨模型网络等模型。本文是讨论会议问题的高性能计算科学的数值算法的一部分。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号