PufferFish: NUMA-Aware Work-stealing Library using Elastic Tasks

机译：Pufferfish：Numa感知工作窃取库使用弹性任务

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Due to the challenges in providing adequate memory access to many cores on a single processor, Multi-Die and Multi-Socket based multicore systems are becoming mainstream. These systems offer cache-coherent Non-Uniform Memory Access (NUMA) across several memory banks and cache hierarchy to increase memory capacity and bandwidth. Random work-stealing is a widely used technique for dynamic load balancing of tasks on multicore processors. However, it scales poorly on such NUMA systems for memory-bound applications due to cache misses and remote memory access latency. Hierarchical Place Tree (HPT) [1] is a popular approach for improving the locality of a task-based parallel programming model, albeit it requires the programmer to map the dynamically unfolding tasks over a NUMA system evenly. Specifying data-affinity hints provides a more natural way to map the tasks than HPT. Still, a scalable work-stealing implementation for the same is mostly unexplored for modern NUMA systems. This paper presents PufferFish, a new async-finish parallel programming model and work-stealing runtime for NUMA systems that provide a close coupling of the data-affinity hints provided for an asynchronous task with the HPTs in Habanero C/C++ library (HClib). PufferFish introduces Hierarchical Elastic Tasks (HET) that improves the locality by shrinking itself to run on a single worker inside a place or puffing up across multiple workers depending on the work imbalance at a particular place in an HPT. We use a set of widely used memory-bound benchmarks exhibiting regular and irregular execution graphs for evaluating PufferFish. On these benchmarks, we show that PufferFish achieves a geometric mean speedup of 1.5× and 1.9× over HPT implementation in HClib and random work-stealing in CilkPlus, respectively, on a 32-core NUMA AMD EPYC processor.

机译：由于在单个处理器上提供足够的内存访问对许多核心的挑战，基于多模和多插槽的多核系统正在成为主流。这些系统在多个内存库和缓存层次结构上提供缓存相干的非统一内存访问（NUMA），以提高内存容量和带宽。随机工作窃取是多核处理器上任务的动态负载平衡的广泛使用技术。但是，由于高速缓存未命中和远程内存访问延迟，它在用于内存绑定应用程序的NUMA系统上缩放不良。分层位置树（HPT）[1]是一种流行的方法，用于改进基于任务的并行编程模型的局部性，尽管它要求程序员均匀地将动态展开的任务均匀地映射。指定数据 - 亲和提示提供了更自然的方式来映射任务而不是HPT。尽管如此，对于现代Numa系统，同样的可扩展性窃取实施主要是未开发的。本文介绍了Bufferfish，新的Async-Finish Spararing编程模型和工作窃取运行时，用于NUMA系统，提供了具有Habanero C / C ++库（HCLIB）中的HPTS提供的异步任务提供的数据亲和暗示的密切耦合。 Pufferfish推出了层次的弹性任务（HET），通过缩小自身以在一个地方内的单个工人内或跨多个工人挖掘的单个工作人员来推出局部性，这取决于HPT中特定地点的工作不平衡。我们使用一组广泛使用的内存绑定基准，呈现用于评估河豚的常规和不规则执行图。在这些基准测试中，我们表明，Pufferfish分别在32核NUMA AMD EPYC处理器上分别在HCLIB和随机工作窃取中的HPT实现中实现了1.5倍和1.9倍的几何平均加速。

著录项

来源
《International Conference on High Performance Computing, Data, and Analytics》|2020年|251-260|共10页
会议地点
作者
Vivek Kumar;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Runtime; Multicore processing; Parallel programming; Computational modeling; Benchmark testing; Parallel processing; Load management;

机译：运行时;多核处理;并行编程;计算建模;基准测试;并行处理;负载管理;

相似文献

外文文献
中文文献
专利

1. NUMA-aware Scheduling and Memory Allocation for data-flow task-parallel Applications [J] . Drebes Andi, Pop Antoniu, Heydemann Karine, ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2016,第8期

机译：面向数据流任务并行应用程序的NUMA感知调度和内存分配
2. Staccato: shared-memory work-stealing task scheduler with cache-aware memory management [J] . Ruslan Kuchumov, Andrey Sokolov, Vladimir Korkhov International journal of web and grid services . 2019,第4期

机译：Staccato：具有缓存感知内存管理功能的共享内存工作窃取任务计划程序
3. A Work-Stealing Scheduler for X10's Task Parallelism with Suspension [J] . Olivier Tardieu, Haichuan Wang, Haibo Lin ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2012,第8期

机译：带有暂停功能的X10任务并行性的窃取工作调度程序
4. A NUMA-Aware Provably-Efficient Task-Parallel Platform Based on the Work-First Principle [C] . Justin Deters, Jiaye Wu, Yifan Xu, International Symposium on Workload Characterization . 2018

机译：基于工作至上原则的可识别NUMA的高效任务并行平台
5. A Scalable Locality-aware Adaptive Work-stealing Scheduler for Multi-core Task Parallelism. [D] . Guo, Yi. 2010

机译：用于多核任务并行性的可扩展的可感知位置的自适应工作窃取调度程序。
6. The Association of Academic Health Sciences Libraries legislative activities and the Joint Medical Library Association/Association of Academic Health Sciences Libraries Legislative Task Force [O] . Joan S. Zenan 2003

机译：学术健康科学图书馆协会的立法活动和医学图书馆联合协会/学术健康科学图书馆协会立法的特别工作组
7. A NUMA-Aware Provably-Efficient Task-Parallel Platform Based on the Work-First Principle [O] . Justin Deters, Jiaye Wu, Yifan Xu, 2018

机译：基于工作第一原理的Numa感知可透明的任务平行平台

PufferFish: NUMA-Aware Work-stealing Library using Elastic Tasks

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅