...
首页> 外文期刊>ACM Transactions on Architecture and Code Optimization >Locality-Aware Work Stealing Based on Online Profiling and Auto-Tuning for Multisocket Multicore Architectures
【24h】

Locality-Aware Work Stealing Based on Online Profiling and Auto-Tuning for Multisocket Multicore Architectures

机译:基于套接字和自动调整的多插槽多核体系结构的本地感知工作窃取

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Modern mainstream powerful computers adopt multisocket multicore CPU architecture and NUMA-based memory architecture. While traditional work-stealing schedulers are designed for single-socket architectures, they incur severe shared cache misses and remote memory accesses in these computers. To solve the problem, we propose a locality-aware work-stealing (LAWS) scheduler, which better utilizes both the shared cache and the memory system. In LAWS, a load-balanced task allocator is used to evenly split and store the dataset of a program to all the memory nodes and allocate a task to the socket where the local memory node stores its data for reducing remote memory accesses. Then, an adaptive DAG packer adopts an auto-tuning approach to optimally pack an execution DAG into cache-friendly subtrees. After cache-friendly subtrees are created, every socket executes cache-friendly subtrees sequentially for optimizing shared cache usage. Meanwhile, a triple-level work-stealing scheduler is applied to schedule the subtrees and the tasks in each subtree. Through theoretical analysis, we show that LAWS has comparable time and space bounds compared with traditional work-stealing schedulers. Experimental results show that LAWS can improve the performance of memory-bound programs up to 54.2% on AMD-based experimental platforms and up to 48.6% on Intel-based experimental platforms compared with traditional work-stealing schedulers.
机译:现代主流功能强大的计算机采用多路多核CPU架构和基于NUMA的内存架构。传统的窃取工作计划程序是为单路体系结构设计的,但它们在这些计算机中会导致严重的共享高速缓存未命中和远程内存访问。为了解决该问题,我们提出了一种本地感知工作窃取(LAWS)调度程序,该调度程序可以更好地利用共享缓存和内存系统。在LAWS中,使用负载平衡的任务分配器将程序的数据集平均分配并存储到所有内存节点,并将任务分配给本地内存节点存储其数据的套接字,以减少远程内存访问。然后,自适应DAG打包程序采用自动调整方法,以将执行DAG最佳地打包到对缓存友好的子树中。创建缓存友好子树后,每个套接字将顺序执行缓存友好子树,以优化共享缓存的使用。同时,应用三级工作窃取调度程序来调度子树和每个子树中的任务。通过理论分析,我们发现LAWS与传统的偷窃工作计划程序相比具有可比的时间和空间范围。实验结果表明,与传统的工作窃取调度程序相比,LAWS在基于AMD的实验平台上可以将内存绑定程序的性能提高多达54.2%,在基于Intel的实验平台上可以提高高达48.6%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号