StaleLearn: Learning Acceleration with Asynchronous Synchronization Between Model Replicas on PIM

Lee Joo Hwan; Kim Hyesoon

首页> 外文期刊>Fortschritte der Physik >StaleLearn: Learning Acceleration with Asynchronous Synchronization Between Model Replicas on PIM

【24h】

StaleLearn: Learning Acceleration with Asynchronous Synchronization Between Model Replicas on PIM

机译：Stalearnn：在PIM上使用模型副本之间的异步同步学习加速

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

GPU has become popular with a large amount of parallelism found in learning. While the GPU has been effective for many learning tasks, still many GPU learning applications have low execution efficiency due to sparse data. Sparse data induces divergent memory accesses with low locality, thereby consuming a large fraction of execution time transferring data across the memory hierarchy. Although a considerable effort has been devoted to reducing the memory divergence, iterative-convergent learning provides a unique opportunity to achieve full potential in modern GPUs that it allows different threads to continue computation using stale values. In this paper, we propose StaleLearn, a learning acceleration mechanism to reduce the memory divergence overhead of GPU learning by utilizing the stale value tolerance of the iterative-convergent learning. Based on the stale value tolerance, StaleLearn transforms the problem of divergent memory accesses into the synchronization problem by replicating the model and reduces the synchronization overhead by asynchronous synchronization on Processor-in-Memory (PIM). The stale value tolerance enables a clear task decomposition between the GPU and PIM, which can effectively exploit parallelism between PIM and GPU. On average, our approach accelerates representative GPU learning applications by 3.17 times with existing PIM proposals.

机译：GPU在学习中发现了大量的并行性。虽然GPU对许多学习任务有效，但仍有许多GPU学习应用程序由于数据稀疏而具有低执行效率。稀疏数据引起具有低局部性的发散存储器访问，从而消耗跨存储层次结构的大部分执行时间传输数据。虽然已经致力于降低内存分歧的相当大的努力，但迭代 - 融合学习提供了一个独特的机会，以实现现代GPU的全部潜力，即它允许使用不同的线程使用陈旧值继续计算。在本文中，我们提出了Stalearearn，一种学习加速机制，通过利用迭代收敛学习的陈旧价值公差来降低GPU学习的内存分歧。基于陈旧价值公差，StaleRearn通过对模型进行复制并通过对存储器内存（PIM）的异步同步来减少异步同步的同步开销来转换分歧存储器进入同步问题的问题。陈旧价值容差能够在GPU和PIM之间进行清晰的任务分解，这可以有效地利用PIM和GPU之间的并行性。平均而言，我们的方法将代表性GPU学习应用程序加速3.17次，现有的PIM提案。

著录项

来源
《Fortschritte der Physik》 |2018年第6期|共13页
作者
Lee Joo Hwan; Kim Hyesoon;
展开▼
作者单位

Samsung Semicond Inc San Jose CA 95134 USA;

Georgia Inst Technol Sch Comp Sci Atlanta GA 30332 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类物理学;
关键词
Learning acceleration; asynchronous synchronization; GPU memory divergence; stale value tolerance; problem transformation; processor-in-memory;

机译：学习加速度;异步同步;GPU内存发散;陈旧值容差;问题转换;处理器 - 内存;

相似文献

外文文献
中文文献
专利

1. StaleLearn: Learning Acceleration with Asynchronous Synchronization Between Model Replicas on PIM [J] . Lee Joo Hwan, Kim Hyesoon Fortschritte der Physik . 2018,第6期

机译：Stalearnn：在PIM上使用模型副本之间的异步同步学习加速
2. Asynchronous Curriculum “Socially Synchronized”: Learning Via Competition [J] . Smart Jon, Olson Adriana Segura, Muck Andrew Western Journal of Emergency Medicine . 2019,第1期

机译：“社会同步”异步课程：通过竞争学习
3. Research on Asynchronous Distributed Deep Learning Technology―Optimizing Machine Learning Models in the Age of Distributed Data Storage [J] . Kenta Niwa NTT Technical Review . 2021,第9期

机译：分布式数据存储时代异步分布式深度学习技术优化机器学习模型研究
4. Synchronizing asynchronous learning - Combining synchronous and asynchronous techniques [C] . Worthington T. International Conference on computer science education . 2013

机译：同步异步学习-结合同步和异步技术
5. An individual differences learning model (IDL) for asynchronous distributed learning (ADL) preferences: Gender, cultural backgrounds, learning styles, and attitudes toward collaborative learning. [D] . Wu, Su-Chin. 2006

机译：针对异步分布式学习（ADL）偏好的个人差异学习模型（IDL）：性别，文化背景，学习风格和对协作学习的态度。
6. Asynchronous Curriculum Socially Synchronized: Learning Via Competition [O] . Jon Smart, Adriana Segura Olson, Andrew Muck 2019

机译：社会同步异步课程：通过竞争学习
7. Asynchronous Curriculum “Socially Synchronized”: Learning Via Competition [O] . Jon Smart, Adriana Segura Olson, Andrew Muck 2018

机译：异步课程“社会同步”：通过竞争学习

StaleLearn: Learning Acceleration with Asynchronous Synchronization Between Model Replicas on PIM

摘要

著录项

相似文献

相关主题

期刊订阅