首页> 外文期刊>Scientific programming >Locality-Aware Task Scheduling and Data Distribution for OpenMP Programs on NUMA Systems and Manycore Processors
【24h】

Locality-Aware Task Scheduling and Data Distribution for OpenMP Programs on NUMA Systems and Manycore Processors

机译:NUMA系统和Manycore处理器上的OpenMP程序的位置感知任务计划和数据分发

获取原文
       

摘要

Performance degradation due to nonuniform data access latencies has worsened on NUMA systems and can now be felt on-chip in manycore processors. Distributing data across NUMA nodes and manycore processor caches is necessary to reduce the impact of nonuniform latencies. However, techniques for distributing data are error-prone and fragile and require low-level architectural knowledge. Existing task scheduling policies favor quick load-balancing at the expense of locality and ignore NUMA node/manycore cache access latencies while scheduling. Locality-aware scheduling, in conjunction with or as a replacement for existing scheduling, is necessary to minimize NUMA effects and sustain performance. We present a data distribution and locality-aware scheduling technique for task-based OpenMP programs executing on NUMA systems and manycore processors. Our technique relieves the programmer from thinking of NUMA system/manycore processor architecture details by delegating data distribution to the runtime system and uses task data dependence information to guide the scheduling of OpenMP tasks to reduce data stall times. We demonstrate our technique on a four-socket AMD Opteron machine with eight NUMA nodes and on the TILEPro64 processor and identify that data distribution and locality-aware task scheduling improve performance up to 69% for scientific benchmarks compared to default policies and yet provide an architecture-oblivious approach for programmers.
机译:由于不均匀的数据访问延迟而导致的性能下降在NUMA系统上更加严重,现在可以在许多核心处理器的片上感觉到。必须在NUMA节点和多核处理器缓存之间分配数据,以减少非均匀延迟的影响。但是,用于分发数据的技术容易出错且易碎,并且需要底层架构知识。现有的任务调度策略倾向于以局部性为代价的快速负载平衡,并在调度时忽略NUMA节点/ manycore缓存访问等待时间。结合或替代现有调度的位置感知调度对于最小化NUMA影响和维持性能是必要的。我们为在NUMA系统和manycore处理器上执行的基于任务的OpenMP程序提供了一种数据分发和位置感知的调度技术。通过将数据分发委托给运行时系统,我们的技术使程序员无需考虑NUMA系统/多核处理器体系结构的详细信息,并使用任务数据相关性信息来指导OpenMP任务的调度以减少数据停顿时间。我们在具有8个NUMA节点的四路AMD Opteron机器和TILEPro64处理器上演示了我们的技术,并确定与默认策略相比,数据分发和可感知位置的任务调度将科学基准的性能提高了69%,但仍提供了体系结构-程序员的遗忘方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号