首页> 外文会议>International Symposium on Microarchitecture >In-Register Parameter Caching for Dynamic Neural Nets with Virtual Persistent Processor Specialization
【24h】

In-Register Parameter Caching for Dynamic Neural Nets with Virtual Persistent Processor Specialization

机译:具有虚拟持久处理器专业化的动态神经网络的注册参数缓存

获取原文

摘要

Dynamic neural networks enable higher representation flexibility compared to networks with a fixed architecture and are extensively deployed in problems dealing with varying input-induced network structure, such as those in Natural Language Processing. One of the standard optimizations used in static net training is persistency of recurrent weights on the chip. In dynamic nets, possibly-inhomogeneous computation graph for every input prevents caching recurrent weights in GPU registers. Therefore, existing solutions suffer from excessive recurring off-chip memory loads as well as compounded kernel launch overheads leading to underutilization of GPU SMs. In this paper, we present a software system that enables persistency of weight matrices during the training of dynamic neural networks on the GPU. Before the training begins, our approach named Virtual Persistent Processor Specialization (VPPS) specializes a forward-backward propagation kernel that contains in-register caching and operation routines. VPPS virtualizes persistent kernel CTAs as CISC-like vector processors that can be guided to execute supplied instructions. VPPS greatly reduces the overall amount of off-chip loads by caching weight matrices on the chip, while simultaneously, provides maximum portability as it does not make any assumptions about the shape of the given computation graphs hence fulfilling dynamic net requirements. We implemented our solution on DyNet and abstracted away its design complexities by providing simple function calls to the user. Our experiments on a Volta micro-architecture shows that, unlike the most competitive solutions, VPPS shows excellent performance even in small batch sizes and delivers up to 6x speedup on training dynamic nets.
机译:与具有固定架构的网络相比,动态神经网络使得能够更高的表示灵活性,并且在处理不同输入引起的网络结构的问题中被广泛地部署,例如自然语言处理中的那些。静态净训练中使用的标准优化之一是芯片上的复发权重的持久性。在动态网络中,每个输入的可能 - 不均匀的计算图都可以防止GPU寄存器中的缓存复发权重。因此,现有解决方案遭受过多的反复性的离芯片内存负荷以及复合的内核发射开销,导致GPU SMS的未充分利用。在本文中,我们提出了一种软件系统,该软件系统能够在GPU上训练动态神经网络的训练期间实现权重矩阵的持久性。在培训开始之前,我们的方法命名为虚拟持久处理器专业化(VPPS)专​​门从前向后传播内核,其中包含寄存器中的缓存和操作例程。 VPPS将持久性内核CTA虚拟化为类似的CISC的矢量处理器,可以被引导以执行提供的说明。 VPP通过缓存芯片上的重量矩阵大大降低了片外载荷的总量,同时提供最大的可移植性,因为它不会对给定的计算图的形状进行任何假设,因此满足了动态净要求。我们通过向用户提供简单的函数调用,在Dynet上实现了我们的解决方案,并通过为用户提供简单的函数来抽象它的设计复杂性。我们对Volta微架构的实验表明,与最具竞争力的解决方案不同,VPPS即使在小批量尺寸下也表现出优异的性能,并在训练动态网上提供高达6倍的加速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号