首页> 外文期刊>Concurrency and Computation >Piecewise holistic autotuning of parallel programs with CERE
【24h】

Piecewise holistic autotuning of parallel programs with CERE

机译:使用CERE的并行程序分段整体自动调整

获取原文
获取原文并翻译 | 示例

摘要

Current architecture complexity requires fine tuning of compiler and runtime parameters tornachieve best performance.Autotuning substantially improves default parameters inmany scenarios,rnbut it is a costly process requiring long iterative evaluations.We propose an automatic piecewisernautotuner based on CERE (Codelet Extractor and Replayer). CERE decomposes applicationsrninto small pieces called codelets: Each codeletmaps to a loop or to anOpenMPparallel region andrncan be replayed as a standalone program.Codelet autotuning achieves better speedups at a lowerrntuning cost. By grouping codelet invocations with the same performance behavior, CERE reducesrnthe number of loops or OpenMP regions to be evaluated. Moreover, unlike whole-program tuning,rnCERE customizes the set of best parameters for each specific OpenMP region or loop. Werndemonstrate the CERE tuning of compiler optimizations, number of threads, thread affinity, andrnscheduling policy on both nonuniformmemoryaccess and heterogeneous architectures.Over thernNAS benchmarks, we achieve an average speedup of 1.08× after tuning. Tuning a codelet is 13×rncheaper than whole-program evaluation and predicts the tuning impact with a 94.7% accuracy.rnSimilarly, exploring thread configurations and scheduling policies for a Black-Scholes solver on anrnheterogeneous big.LITTLE architecture is over 40× faster using CERE.
机译:当前的体系结构复杂性要求对编译器和运行时参数进行微调,以实现最佳性能。自动优化可在许多情况下改善默认参数,但这是一个耗时较长的过程,需要长时间的迭代评估。我们提出了一种基于CERE(小码提取和重播器)的自动分段自动调谐器。 CERE将应用程序分解为称为小代码的小片段:每个小代码映射到循环或OpenMPparallel区域,并且可以作为独立程序进行重放。小代码自动调整以较低的调整成本实现了更好的加速。通过将具有相同性能行为的小码调用分组,CERE减少了要评估的循环或OpenMP区域的数量。此外,与整个程序调整不同,rnCERE为每个特定的OpenMP区域或循环自定义最佳参数集。在非均匀内存访问和异构体系结构上,对编译器优化的CERE调优,线程数,线程亲和力和调度策略进行演示。通过rnNAS基准测试,调优后我们的平均速度提高了1.08倍。调优小码比整个程序评估便宜13倍,并以94.7%的精度预测调优的影响.rn同样,探索非均质big上的Black-Scholes求解器的线程配置和调度策略。使用CERE,LITTLE架构的速度快40倍以上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号