BSSync: Processing Near Memory for Machine Learning Workloads with Bounded Staleness Consistency Models

机译：BSSync：具有有限的陈旧性一致性模型的机器学习工作负载的近内存处理

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Parallel machine learning workloads have become prevalent in numerous application domains. Many of these workloads are iterative convergent, allowing different threads to compute in an asynchronous manner, relaxing certain read-after-write data dependencies to use stale values. While considerable effort has been devoted to reducing the communication latency between nodes by utilizing asynchronous parallelism, inefficient utilization of relaxed consistency models within a single node have caused parallel implementations to have low execution efficiency. The long latency and serialization caused by atomic operations have a significant impact on performance. The data communication is not overlapped with the main computation, which reduces execution efficiency. The inefficiency comes from the data movement between where they are stored and where they are processed. In this work, we propose Bounded Staled Sync (BSSync), a hardware support for the bounded staleness consistency model, which accompanies simple logic layers in the memory hierarchy. BSSync overlaps the long latency atomic operation with the main computation, targeting iterative convergent machine learning workloads. Compared to previous work that allows staleness for read operations, BSSync utilizes staleness for write operations, allowing stale-writes. We demonstrate the benefit of the proposed scheme for representative machine learning workloads. On average, our approach outperforms the baseline asynchronous parallel implementation by 1.33x times.

机译：并行机器学习工作负载已在许多应用程序领域中普遍存在。这些工作负载中有许多是迭代收敛的，从而允许不同的线程以异步方式进行计算，从而放宽了某些写入后读取的数据依赖性，以使用陈旧的值。尽管已经投入大量精力来通过利用异步并行性来减少节点之间的通信等待时间，但是单个节点内松弛一致性模型的低效利用已导致并行实现的执行效率低下。原子操作导致的长等待时间和序列化对性能有重大影响。数据通信不与主要计算重叠，从而降低了执行效率。效率低下是由于它们在存储位置和处理位置之间的数据移动。在这项工作中，我们提出有界稳定同步（BSSync），这是对有界陈旧性一致性模型的硬件支持，该模型与存储器层次结构中的简单逻辑层一起提供。 BSSync将长等待时间的原子操作与主要计算重叠，针对迭代的收敛式机器学习工作负载。与以前允许读操作过时的工作相比，BSSync将过时用于写操作，从而允许过时写操作。我们证明了提出的方案对于代表性机器学习工作负载的好处。平均而言，我们的方法比基线异步并行实现的性能高1.33倍。

著录项

来源
《2015 International Conference on Parallel Architecture and Compilation》|2015年|241-252|共12页
会议地点 San FranciscoCA(US)
作者
Joo Hwan Lee; Jaewoong Sim; Hyesoon Kim;
展开▼
作者单位

Sch. of Comput. Sci., Georgia Inst. of Technol., Atlanta, GA, USA;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Asynchronous Parallelism; Atomic Operation; Bounded Staleness Consistency Model; Iterative Convergent Machine Learning Workloads;

机译：异步并行;原子操作;有界的一致性模型;迭代收敛的机器学习工作量;

相似文献

外文文献
中文文献
专利

1. An evaluation of memory consistency models for shared-memory systems with ILP processors [J] . Vijay S. Pai, Parthasarathy Ranganathan, Sarita V. Adve, ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 1996,第9期

机译：对具有ILP处理器的共享内存系统的内存一致性模型的评估
2. An Evaluation of Memory Consistency Models for Shared-Memory Systems with ILP Processors [J] . Vijay S. Pai, Parthasarathy Ranganathan, Sarita V. Adve, Operating systems review . 1996,第5期

机译：具有ILP处理器的共享内存系统的内存一致性模型评估
3. Merged Logic and Memory Fabrics for Accelerating Machine Learning Workloads [J] . Crafton Brian, Spetalnick Samuel, Fang Yan, Computing in Science & Engineering . 2021,第1期

机译：合并加速机器学习工作负载的逻辑和内存面料
4. BSSync: Processing Near Memory for Machine Learning Workloads with Bounded Staleness Consistency Models [C] . Joo Hwan Lee, Jaewoong Sim, Hyesoon Kim International Conference on Parallel Architecture and Compilation . 2015

机译：BSSync：处理内存附近的机器学习工作负载，具有有界性速度的一致性模型
5. Development and analysis of weak memory consistency models to accelerate shared-memory multiprocessor systems [D] . Yoon, Myungchul 1998

机译：开发和分析弱内存一致性模型以加速共享内存多处理器系统
6. Gaussian Process Regression for Predictive But Interpretable Machine Learning Models: An Example of Predicting Mental Workload across Tasks [O] . Matthew S. Caywood, Daniel M. Roberts, Jeffrey B. Colombe, 2016

机译：可预测但可解释的机器学习模型的高斯过程回归：预测跨任务的心理工作量的示例
7. Machine-Learning Based Memory Prediction Model for Data Parallel Workloads in Apache Spark [O] . Rohyoung Myung, Sukyong Choi 2021

机译：Apache Spark中数据并行工作负载基于机器学习的内存预测模型

BSSync: Processing Near Memory for Machine Learning Workloads with Bounded Staleness Consistency Models

摘要

著录项

相似文献

相关主题

期刊订阅