Autotuning of a Cut-Off for Task Parallel Programs

机译：自动调整任务并行程序的临界值

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

A task parallel programming model is regarded as one of the promising parallel programming models with dynamic load balancing. Since this model supports hierarchical parallelism, it is suitable for parallel divide-and-conquer algorithms. Most naive divide-and-conquer task parallel programs, however, suffer from a high tasking overhead because they tend to create too fine-grained tasks. There are two key idea to enhance the performance of such a program: serializing a task in a cut-off condition which is a tradeoff between decrease of concurrency and parallelization overheads, and applying effective transformations for the task in the condition. Both are sensitive to algorithm features, rendering optimization solely with a compiler ineffective in some cases. To address this problem, we proposed an autotuning framework for divide-and-conquer task parallel programs. It automatically searches for the optimal combination of three basic transformation methods and switching conditions with less programmers' efforts. We implemented it as an optimization pass in LLVM. The evaluation shows the significant performance improvement (from 1.5x to 228x) over the original naive task parallel programs. Moreover, it demonstrates the absolute performance obtained by our autotuning framework was comparable to that of loop parallel programs.

机译：任务并行编程模型被认为是具有动态负载平衡的有前途的并行编程模型之一。由于此模型支持分层并行性，因此适用于并行分治算法。但是，大多数幼稚的“分而治之”任务并行程序都有很高的任务开销，因为它们倾向于创建过于精细的任务。有两个关键思想可以提高这种程序的性能：在截止条件下对任务进行序列化（这是并发减少与并行化开销之间的折衷），以及在条件下对该任务进行有效的转换。两者都对算法功能敏感，因此在某些情况下仅靠编译器进行优化是无效的。为了解决这个问题，我们为分治任务并行程序提出了一个自动调整框架。它以较少的程序员工作量自动搜索三种基本转换方法和切换条件的最佳组合。我们将其实现为LLVM中的优化过程。评估显示，与原始的朴素任务并行程序相比，性能有了显着提高（从1.5倍提高到228倍）。而且，它证明了我们的自动调整框架所获得的绝对性能可与循环并行程序相媲美。

著录项

来源
《International Symposium on Embedded Multicore/Many-core Systems-on-Chip》|2016年|353-360|共8页
会议地点
作者
Shintaro Iwasaki; Kenjiro Taura;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Optimization; Parallel processing; Heuristic algorithms; Parallel programming; Load modeling; Vegetation; Algorithm design and analysis;

机译：优化;并行处理;启发式算法;并行编程;负载建模;植被;算法设计与分析;

相似文献

外文文献
中文文献
专利

1. Piecewise holistic autotuning of parallel programs with CERE [J] . Mihai, Popov, Chadi Akel, Concurrency and Computation . 2017,第15期

机译：使用CERE的并行程序分段整体自动调整
2. Automatic Parallelization: Executing Sequential Programs on a Task-Based Parallel Runtime [J] . Alcides Fonseca, Bruno Cabral, Joao Rafael, International journal of parallel programming . 2016,第6期

机译：自动并行化：在基于任务的并行运行时执行顺序程序
3. Integrating task parallelism in data parallel languages for parallel programming on NOWs [J] . K.J.Binu, D.Janaki Ram CONCURRENCY PRACTICE & EXPERIENCE . 2000,第13期

机译：在数据并行语言中集成任务并行性，以便在NOW上进行并行编程
4. Autotuning of a Cut-Off for Task Parallel Programs [C] . Shintaro Iwasaki, Kenjiro Taura International Symposium on Embedded Multicore/Many-core Systems-on-Chip . 2016

机译：用于任务并行程序的截止自动调位
5. Parallelism-Driven Performance Analysis Techniques for Task Parallel Programs [D] . Yoga, Adarsh . 2019

机译：并行驱动的任务并行程序分析技术
6. Parallels between Global Transcriptional Programs of Polarizing Caco-2 Intestinal Epithelial Cells In Vitro and Gene Expression Programs in Normal Colon and Colon Cancer [O] . Annika M. Sääf, Jennifer M. Halbleib, Xin Chen, 1888

机译：体外极化Caco-2肠上皮细胞的全球转录程序与正常结肠癌和结肠癌中的基因表达程序之间的平行性
7. Autotuning of parallel programs using the IBM Watsons Analytics data analysis system [O] . A.Yu. Doroshenko, O.S. Novak, P.A. Ivanenko, 2018

机译：使用IBM WATSONS分析数据分析系统自动调整并行程序

Autotuning of a Cut-Off for Task Parallel Programs

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅