首页> 外文会议>International Conference on High Performance Computing, Data, and Analytics >PufferFish: NUMA-Aware Work-stealing Library using Elastic Tasks
【24h】

PufferFish: NUMA-Aware Work-stealing Library using Elastic Tasks

机译:Pufferfish:Numa感知工作窃取库使用弹性任务

获取原文
获取外文期刊封面目录资料

摘要

Due to the challenges in providing adequate memory access to many cores on a single processor, Multi-Die and Multi-Socket based multicore systems are becoming mainstream. These systems offer cache-coherent Non-Uniform Memory Access (NUMA) across several memory banks and cache hierarchy to increase memory capacity and bandwidth. Random work-stealing is a widely used technique for dynamic load balancing of tasks on multicore processors. However, it scales poorly on such NUMA systems for memory-bound applications due to cache misses and remote memory access latency. Hierarchical Place Tree (HPT) [1] is a popular approach for improving the locality of a task-based parallel programming model, albeit it requires the programmer to map the dynamically unfolding tasks over a NUMA system evenly. Specifying data-affinity hints provides a more natural way to map the tasks than HPT. Still, a scalable work-stealing implementation for the same is mostly unexplored for modern NUMA systems. This paper presents PufferFish, a new async-finish parallel programming model and work-stealing runtime for NUMA systems that provide a close coupling of the data-affinity hints provided for an asynchronous task with the HPTs in Habanero C/C++ library (HClib). PufferFish introduces Hierarchical Elastic Tasks (HET) that improves the locality by shrinking itself to run on a single worker inside a place or puffing up across multiple workers depending on the work imbalance at a particular place in an HPT. We use a set of widely used memory-bound benchmarks exhibiting regular and irregular execution graphs for evaluating PufferFish. On these benchmarks, we show that PufferFish achieves a geometric mean speedup of 1.5× and 1.9× over HPT implementation in HClib and random work-stealing in CilkPlus, respectively, on a 32-core NUMA AMD EPYC processor.
机译:由于在单个处理器上提供足够的内存访问对许多核心的挑战,基于多模和多插槽的多核系统正在成为主流。这些系统在多个内存库和缓存层次结构上提供缓存相干的非统一内存访问(NUMA),以提高内存容量和带宽。随机工作窃取是多核处理器上任务的动态负载平衡的广泛使用技术。但是,由于高速缓存未命中和远程内存访问延迟,它在用于内存绑定应用程序的NUMA系统上缩放不良。分层位置树(HPT)[1]是一种流行的方法,用于改进基于任务的并行编程模型的局部性,尽管它要求程序员均匀地将动态展开的任务均匀地映射。指定数据 - 亲和提示提供了更自然的方式来映射任务而不是HPT。尽管如此,对于现代Numa系统,同样的可扩展性窃取实施主要是未开发的。本文介绍了Bufferfish,新的Async-Finish Spararing编程模型和工作窃取运行时,用于NUMA系统,提供了具有Habanero C / C ++库(HCLIB)中的HPTS提供的异步任务提供的数据亲和暗示的密切耦合。 Pufferfish推出了层次的弹性任务(HET),通过缩小自身以在一个地方内的单个工人内或跨多个工人挖掘的单个工作人员来推出局部性,这取决于HPT中特定地点的工作不平衡。我们使用一组广泛使用的内存绑定基准,呈现用于评估河豚的常规和不规则执行图。在这些基准测试中,我们表明,Pufferfish分别在32核NUMA AMD EPYC处理器上分别在HCLIB和随机工作窃取中的HPT实现中实现了1.5倍和1.9倍的几何平均加速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号