首页> 外文期刊>Fortschritte der Physik >StaleLearn: Learning Acceleration with Asynchronous Synchronization Between Model Replicas on PIM
【24h】

StaleLearn: Learning Acceleration with Asynchronous Synchronization Between Model Replicas on PIM

机译:Stalearnn:在PIM上使用模型副本之间的异步同步学习加速

获取原文
获取原文并翻译 | 示例
           

摘要

GPU has become popular with a large amount of parallelism found in learning. While the GPU has been effective for many learning tasks, still many GPU learning applications have low execution efficiency due to sparse data. Sparse data induces divergent memory accesses with low locality, thereby consuming a large fraction of execution time transferring data across the memory hierarchy. Although a considerable effort has been devoted to reducing the memory divergence, iterative-convergent learning provides a unique opportunity to achieve full potential in modern GPUs that it allows different threads to continue computation using stale values. In this paper, we propose StaleLearn, a learning acceleration mechanism to reduce the memory divergence overhead of GPU learning by utilizing the stale value tolerance of the iterative-convergent learning. Based on the stale value tolerance, StaleLearn transforms the problem of divergent memory accesses into the synchronization problem by replicating the model and reduces the synchronization overhead by asynchronous synchronization on Processor-in-Memory (PIM). The stale value tolerance enables a clear task decomposition between the GPU and PIM, which can effectively exploit parallelism between PIM and GPU. On average, our approach accelerates representative GPU learning applications by 3.17 times with existing PIM proposals.
机译:GPU在学习中发现了大量的并行性。虽然GPU对许多学习任务有效,但仍有许多GPU学习应用程序由于数据稀疏而具有低执行效率。稀疏数据引起具有低局部性的发散存储器访问,从而消耗跨存储层次结构的大部分执行时间传输数据。虽然已经致力于降低内存分歧的相当大的努力,但迭代 - 融合学习提供了一个独特的机会,以实现现代GPU的全部潜力,即它允许使用不同的线程使用陈旧值继续计算。在本文中,我们提出了Stalearearn,一种学习加速机制,通过利用迭代收敛学习的陈旧价值公差来降低GPU学习的内存分歧。基于陈旧价值公差,StaleRearn通过对模型进行复制并通过对存储器内存(PIM)的异步同步来减少异步同步的同步开销来转换分歧存储器进入同步问题的问题。陈旧价值容差能够在GPU和PIM之间进行清晰的任务分解,这可以有效地利用PIM和GPU之间的并行性。平均而言,我们的方法将代表性GPU学习应用程序加速3.17次,现有的PIM提案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号