...
首页> 外文期刊>ACM Transactions on Architecture and Code Optimization >Energy-Efficient Compilation of Irregular Task-Parallel Loops
【24h】

Energy-Efficient Compilation of Irregular Task-Parallel Loops

机译:节能汇编不规则的任务平行环

获取原文
获取原文并翻译 | 示例
           

摘要

Energy-efficient compilation is an important problem for multi-core systems. In this context, irregular programs with task-parallel loops present interesting challenges: the threads with lesser work-loads (non-critical-threads) wait at the join-points for the thread with maximum work-load (critical-thread); this leads to significant energy wastage. This problem becomes more interesting in the context of multi-socket-multi-core (MSMC) systems, where different sockets may run at different frequencies, but all the cores connected to a socket run at a single frequency. In such a configuration, even though the load-imbalance among the cores may be significant, an MSMC-oblivious technique may miss the opportunities to reduce energy consumption, if the load-imbalance across the sockets is minimal. This problem becomes further challenging in the presence of mutual-exclusion, where scaling the frequencies of a socket executing the non-critical-threads can impact the execution time of the critical-threads. In this article, we propose a scheme (X10Ergy) to obtain energy gains with minimal impact on the execution time, for task-parallel languages, such as X10, HJ, and so on. X10Ergy takes as input a loop-chunked program (parallel-loop iterations divided into chunks and each chunk is executed by a unique thread). X10Ergy follows a mixed compile-time + runtime approach that (i) uses static analysis to efficiently compute the work-load of each chunk at runtime, (ii) computes the "remaining" work-load of the chunks running on the cores of each socket at regular intervals and tunes the frequency of the sockets accordingly, (iii) groups the threads into different sockets (based on the remaining work-load of their respective chunks), and (iv) in the presence of atomic-blocks, models the effect of frequency-scaling on the critical-thread. We implemented X10Ergy for X10 and have obtained encouraging results for the IMSuite kernels.
机译:节能汇编是多核系统的重要问题。在此上下文中,具有任务平行循环的不规则程序存在有趣的挑战:具有较少工作负载(非关键线程)的线程在带有最大工作负载(临界线程)的线程的加入点上等待;这导致了显着的能量浪费。在多套接字 - 多核(MSMC)系统的上下文中,该问题变得更有趣,其中不同的套接字可以以不同的频率运行,但是所有连接到套接字的所有核心以单个频率运行。在这种配置中,即使核之间的负载不平衡可能是显着的,如果套接字跨越套接字的负载 - 不平衡是最小的,则可能会错过减少能量消耗的机会。在相互排除的情况下,此问题变得进一步挑战,其中,执行非临界线程的套接字的频率可以影响临界线程的执行时间。在本文中,我们提出了一种方案(X10ergy),以获得对执行时间的最小影响的能量增益,适用于任务并行语言,例如X10,HJ等。 X10ergy以输入为输入循环组件(并行环路迭代分为块,每个块由唯一的线程执行)。 X10ergy遵循混合编译时+运行时方法,(i)使用静态分析来有效地计算运行时的每个块的工作负载,(ii)计算在每个核心上运行的块的“剩余”工作负载以规则的间隔套接字并相应地调谐套接字的频率,(iii)将线程分组到不同的套接字(基于其各自的块的剩余工作负载),(iv)在存在原子块中,模拟频率缩放对临界线程的影响。我们为X10实现了X10ergy,并获得了IMSuite内核的鼓励结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号