首页> 外文会议>IEEE International Symposium on Parallel Distributed Processing >An auto-tuning framework for parallel multicore stencil computations
【24h】

An auto-tuning framework for parallel multicore stencil computations

机译:并行多核模板计算的自动调整框架

获取原文

摘要

Although stencil auto-tuning has shown tremendous potential in effectively utilizing architectural resources, it has hitherto been limited to single kernel instantiations; in addition, the large variety of stencil kernels used in practice makes this computation pattern difficult to assemble into a library. This work presents a stencil auto-tuning framework that significantly advances programmer productivity by automatically converting a straightforward sequential Fortran 95 stencil expression into tuned parallel implementations in Fortran, C, or CUDA, thus allowing performance portability across diverse computer architectures, including the AMD Barcelona, Intel Nehalem, Sun Victoria Falls, and the latest NVIDIA GPUs. Results show that our generalized methodology delivers significant performance gains of up to 22???? speedup over the reference serial implementation. Overall we demonstrate that such domain-specific auto-tuners hold enormous promise for architectural efficiency, programmer productivity, performance portability, and algorithmic adaptability on existing and emerging multicore systems.
机译:虽然模板自动调整在有效利用架构资源方面表现出巨大的潜力,但它迄今为止仅限于单一内核实例化;此外,实践中使用的各种模板内核使得该计算模式难以组装到库中。这项工作介绍了模板自动调谐框架,通过将直接的顺序Fortran 95模板表达式转换为Fortran,C或CUDA中的调谐并行实现,从而显着提高程序员的生产率,从而允许在包括AMD巴塞罗那(包括AMD Barcelona)的性能便携性英特尔Nehalem,Sun Victoria瀑布,以及最新的NVIDIA GPU。结果表明,我们的广义方法提供了高达22的显着性能收益????加速参考串行实现。总的来说,我们展示了这种域特定的自动调谐器对现有和新兴的多核系统上的架构效率,程序员生产力,性能,性能便携性和算法适应性具有巨大的承诺。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号