Reducing the burden of parallel loop schedulers for many-core processors

Arif Mahwish; Vandierendonck Hans

首页> 外文期刊>Concurrency and computation: practice and experience >Reducing the burden of parallel loop schedulers for many-core processors

【24h】

Reducing the burden of parallel loop schedulers for many-core processors

机译：降低许多核心处理器的并行循环调度器的负担

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

As core counts in processors increases, it becomes harder to schedule and distribute work in a timely and scalable manner. This article enhances the scalability of parallel loop schedulers by specializing schedulers for fine-grain loops. We propose a low-overhead work distribution mechanism for a static scheduler that uses no atomic operations. We integrate our static scheduler with the Intel OpenMP and Cilkplus parallel task schedulers to build hybrid schedulers. Compiler support enables efficient reductions for Cilk, without changing the programming interface of Cilk reducers. Detailed, quantitative measurements demonstrate that our techniques achieve scalable performance on a 48-core machine and the scheduling overhead is 43% lower than Intel OpenMP and 12.1x lower than Cilk. We demonstrate consistent performance improvements on a range of HPC and data analytics codes. Performance gains are more important as loops become finer-grain and thread counts increase. We observe consistently 16%-30% speedup on 48 threads, with a peak of 2.8x speedup.

机译：随着处理器的核心计数增加，更难安排和分配工作，及时和可扩展的方式。本文通过专业调度器来提高平行循环调度仪的可扩展性，用于精细粒度循环。我们为使用没有原子操作的静态调度器提出了一个低开销的工作分配机制。我们将静态调度程序与英特尔OpenMP和CilkPlus并行任务调度程序集成，以构建混合调度程序。编译器支持可以实现CILK的有效缩短，而无需更改CILK REDUCERS的编程接口。具体化的定量测量表明，我们的技术在48核心机器上实现可扩展性能，调度开销低于英特尔OpenMP和12.1x低于Cilk的43％。我们展示了一系列HPC和数据分析代码的效果改进。随着循环变为更精细的谷物和线程增加，性能收益更重要。我们在48个线程上始终如一地观察到16％-30％的加速，峰值为2.8倍。

著录项

来源
《Concurrency and computation: practice and experience》 |2021年第13期|e6241.1-e6241.17|共17页
作者
Arif Mahwish; Vandierendonck Hans;
展开▼
作者单位

Univ Cambridge Comp Sci Lab Cambridge England;

Queens Univ Belfast Sch Elect Elect Engn & Comp Sci Belfast Antrim North Ireland;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
parallel computing; shared#8208; memory synchronization;

机译：并行计算;共享＆＃8208;内存同步;

相似文献

外文文献
中文文献
专利

1. POSTER: Reducing the Burden of Parallel Loop Schedulers for Many-Core Processors [J] . Mahwish Arif, Hans Vandierendonck ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2018,第1期

机译：海报：减少许多核心处理器的并行循环调度器的负担
2. Compiler-Assisted Dynamic Scheduling for Effective Parallelization of Loop Nests on Multicore Processors [J] . Baskaran MM, Vydyanathan N, Bondhugula UK, ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2009,第4期

机译：编译器辅助动态调度，可有效并行化多核处理器上的循环嵌套
3. P-GAS: Parallelizing a Cycle-Accurate Event-Driven Many-Core Processor Simulator Using Parallel Discrete Event Simulation [J] . Yuan Cheng, Lu Bai, Mingyu Chen, Proceedings of the Workshop on Principles of Advanced and Distributed Simulation . 2010,第Null期

机译：P-GAS：使用并行离散事件模拟并行化周期精确的事件驱动多核处理器模拟器
4. Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors [C] . Muthu Manikandan Baskaran, Nagavijayalakshmi Vydyanathan, Uday Kumar Reddy Bondhugula, ACM SIGPLAN symposium on Principles and practice of parallel programming . 2009

机译：编译器辅助的动态调度，可有效并行化多核处理器上的循环嵌套
5. Lane-Based Hardware Specialization for Loop-and Fork-Join-Centric Parallelization and Scheduling Strategies [D] . Srinath, Shreesha. 2018

机译：基于车道的硬件专业化，用于循环和以分叉联接为中心的并行化和调度策略
6. An iterative expanding and shrinking process for processor allocation in mixed-parallel workflow scheduling [O] . Kuo-Chan Huang, Wei-Ya Wu, Feng-Jian Wang, -1

机译：混合并行工作流调度中处理器分配的迭代扩展和收缩过程
7. Reducing the burden of parallel loop schedulers for many-core processors [O] . Mahwish Arif, Hans Vandierendonck 2018

机译：降低许多核心处理器的并行循环调度器的负担

Reducing the burden of parallel loop schedulers for many-core processors

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅