首页> 外文期刊>Concurrency and computation: practice and experience >Reducing the burden of parallel loop schedulers for many-core processors
【24h】

Reducing the burden of parallel loop schedulers for many-core processors

机译:降低许多核心处理器的并行循环调度器的负担

获取原文
获取原文并翻译 | 示例

摘要

As core counts in processors increases, it becomes harder to schedule and distribute work in a timely and scalable manner. This article enhances the scalability of parallel loop schedulers by specializing schedulers for fine-grain loops. We propose a low-overhead work distribution mechanism for a static scheduler that uses no atomic operations. We integrate our static scheduler with the Intel OpenMP and Cilkplus parallel task schedulers to build hybrid schedulers. Compiler support enables efficient reductions for Cilk, without changing the programming interface of Cilk reducers. Detailed, quantitative measurements demonstrate that our techniques achieve scalable performance on a 48-core machine and the scheduling overhead is 43% lower than Intel OpenMP and 12.1x lower than Cilk. We demonstrate consistent performance improvements on a range of HPC and data analytics codes. Performance gains are more important as loops become finer-grain and thread counts increase. We observe consistently 16%-30% speedup on 48 threads, with a peak of 2.8x speedup.
机译:随着处理器的核心计数增加,更难安排和分配工作,及时和可扩展的方式。本文通过专业调度器来提高平行循环调度仪的可扩展性,用于精细粒度循环。我们为使用没有原子操作的静态调度器提出了一个低开销的工作分配机制。我们将静态调度程序与英特尔OpenMP和CilkPlus并行任务调度程序集成,以构建混合调度程序。编译器支持可以实现CILK的有效缩短,而无需更改CILK REDUCERS的编程接口。具体化的定量测量表明,我们的技术在48核心机器上实现可扩展性能,调度开销低于英特尔OpenMP和12.1x低于Cilk的43%。我们展示了一系列HPC和数据分析代码的效果改进。随着循环变为更精细的谷物和线程增加,性能收益更重要。我们在48个线程上始终如一地观察到16%-30%的加速,峰值为2.8倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号