An auto-tuning framework for parallel multicore stencil computations

机译：并行多核模板计算的自动调整框架

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Although stencil auto-tuning has shown tremendous potential in effectively utilizing architectural resources, it has hitherto been limited to single kernel instantiations; in addition, the large variety of stencil kernels used in practice makes this computation pattern difficult to assemble into a library. This work presents a stencil auto-tuning framework that significantly advances programmer productivity by automatically converting a straightforward sequential Fortran 95 stencil expression into tuned parallel implementations in Fortran, C, or CUDA, thus allowing performance portability across diverse computer architectures, including the AMD Barcelona, Intel Nehalem, Sun Victoria Falls, and the latest NVIDIA GPUs. Results show that our generalized methodology delivers significant performance gains of up to 22???? speedup over the reference serial implementation. Overall we demonstrate that such domain-specific auto-tuners hold enormous promise for architectural efficiency, programmer productivity, performance portability, and algorithmic adaptability on existing and emerging multicore systems.

机译：虽然模板自动调整在有效利用架构资源方面表现出巨大的潜力，但它迄今为止仅限于单一内核实例化;此外，实践中使用的各种模板内核使得该计算模式难以组装到库中。这项工作介绍了模板自动调谐框架，通过将直接的顺序Fortran 95模板表达式转换为Fortran，C或CUDA中的调谐并行实现，从而显着提高程序员的生产率，从而允许在包括AMD巴塞罗那（包括AMD Barcelona）的性能便携性英特尔Nehalem，Sun Victoria瀑布，以及最新的NVIDIA GPU。结果表明，我们的广义方法提供了高达22的显着性能收益????加速参考串行实现。总的来说，我们展示了这种域特定的自动调谐器对现有和新兴的多核系统上的架构效率，程序员生产力，性能，性能便携性和算法适应性具有巨大的承诺。

著录项

来源
《IEEE International Symposium on Parallel Distributed Processing》|2010年||共12页
会议地点
作者
Kamil S.; Cy Chan; Oliker L.; Shalf J.; Williams S.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.138-53;
关键词

相似文献

外文文献
中文文献
专利

1. Hierarchical parallelization and optimization of high-order stencil computations on multicore clusters [J] . Hikmet Dursun, Manaschai Kunaseth, Ken-ichi Nomura, Journal of supercomputing . 2012,第2期

机译：多核集群上的高阶模版计算的分层并行化和优化
2. Efficient multicore-aware parallelization strategies for iterative stencil computations [J] . Jan Treibig, Gerhard Wellein, Georg Hager Journal of computational science . 2011,第2期

机译：高效的多核感知并行化模板迭代计算策略
3. Tuning framework for stencil computation in heterogeneous parallel platforms [J] . Ben Cheikh Taieb Lamine, Aguiar Alexandra, Tahar Sofiene, Journal of supercomputing . 2016,第2期

机译：异构并行平台中模板计算的调整框架
4. An auto-tuning framework for parallel multicore stencil computations [C] . Kamil Shoaib, Chan Cy, Oliker Leonid, 2010 IEEE International Symposium on Parallel amp; Distributed Processing (IPDPS) . 2010

机译：并行多核模板计算的自动调整框架
5. Auto-tuning stencil codes for cache-based multicore platforms. [D] . Datta, Kaushik. 2009

机译：自动调整基于缓存的多核平台的模具代码。
6. COMBImage2: a parallel computational framework for higher-order drug combination analysis that includes automated plate design matched filter based object counting and temporal data mining [O] . Efthymia Chantzi, Malin Jarvius, Mia Niklasson, 2019

机译：COMBImage2：用于高阶药物组合分析的并行计算框架包括自动化板设计基于匹配滤波器的对象计数和时间数据挖掘
7. An auto-tuning framework for parallel multicore stencil computations [O] . Shoaib Kamil, Cy Chan, Leonid Oliker, 2010

机译：用于并行多核模板计算的自动调整框架

An auto-tuning framework for parallel multicore stencil computations

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅