首页> 外文会议>International conference on high performance computing for computational science >Evaluation of Runtime Cut-off Approaches for Parallel Programs
【24h】

Evaluation of Runtime Cut-off Approaches for Parallel Programs

机译:并行程序运行时截止方法的评估

获取原文

摘要

Parallel programs have the potential of executing several times faster than sequential programs. However, in order to achieve its potential, several aspects of the execution have to be parameterized, such as the number of threads, task granularity, etc. This work studies the task granularity of regular and irregular parallel programs on symmetrical multicore machines. Task granularity is how many parallel tasks are created to perform a certain computation. If the granularity is too coarse, there might not be enough parallelism to occupy all processors. But if granularity is too fine, a large percentage of the execution time may be spent context switching between tasks, and not performing useful work. Task granularity can be controlled by limiting the creation of new tasks, executing the workload sequentially in the current task. This decision is performed by a cut-off algorithm, which defines a criterion to execute a task workload sequentially or asynchronously. The cut-off algorithm can have a performance impact of several orders of magnitude. This work presents three new cut-off algorithms: MaxTasksInQueue, StackSize and MaxTasksSS. MaxTasksInQueue limits the size of the current thread queue, StackSize limits the number of stacks in recursive calls, and MaxTasksSS limits both the number of tasks and the number of stacks. These new algorithms can improve the performance of parallel programs. Existing studies have analyzed only two cut-off approaches at a time, each with its own set of benchmarks and machines. In this work we present a comparison of a manual threshold approach to 5 state-of-the-art algorithms (MaxTasks, MaxLevel, Adaptive Tasks Cutoff, Load-Based and Surplus Queued Task Count) and 3 new approaches (MaxTasksInQueue, StackSize and MaxTasksSS). The evaluation was performed using 24 parallel programs, including divide-and-conquer and loop programs, on two different machines with 24 and 32 hardware threads, respectively. Our analysis provided insight of how cut-off algorithms behave with different types of programs. We have also identified the best algorithms for combinations of balanced/unbalanced and loop/recursive programs.
机译:并行程序具有比顺序程序快几倍的潜力。但是,为了发挥其潜力,必须对执行的几个方面进行参数化,例如线程数,任务粒度等。这项工作研究对称多核计算机上规则和不规则并行程序的任务粒度。任务粒度是创建多少并行任务以执行特定计算。如果粒度太粗糙,则可能没有足够的并行度来占用所有处理器。但是,如果粒度太细,则可能会花费很大一部分执行时间来在任务之间进行上下文切换,而不执行有用的工作。可以通过限制新任务的创建,在当前任务中顺序执行工作负载来控制任务的粒度。该决定由截止算法执行,该算法定义了按顺序或异步执行任务工作负载的标准。截止算法可能会对性能产生几个数量级的影响。这项工作提出了三个新的截止算法:MaxTasksInQueue,StackSize和MaxTasksSS。 MaxTasksInQueue限制当前线程队列的大小,StackSize限制递归调用中的堆栈数,而MaxTasksSS限制任务数和堆栈数。这些新算法可以提高并行程序的性能。现有研究一次只分析了两种截断方法,每种方法都有自己的基准和计算机。在这项工作中,我们比较了手动阈值方法与5种最新算法(MaxTasks,MaxLevel,自适应任务截止,基于负载和剩余排队任务计数)和3种新方法(MaxTasksInQueue,StackSize和MaxTasksSS) )。评估是使用24个并行程序(包括分而治之和循环程序)在分别具有24和32个硬件线程的两台不同机器上进行的。我们的分析提供了关于截断算法如何在不同类型的程序中表现的见解。我们还为平衡/不平衡和循环/递归程序的组合确定了最佳算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号