首页> 外文会议>24th ACM international conference on supercomputing 2010 >Speeding Up Nek5000 with Autotuning and Specialization
【24h】

Speeding Up Nek5000 with Autotuning and Specialization

机译:通过自动调整和专业化来加速Nek5000

获取原文
获取原文并翻译 | 示例

摘要

Autotuning technology has emerged recently as a systematic process for evaluating alternative implementations of a computation, in order to select the best-performing solution for a particular architecture. Specialization optimizes code customized to a particular class of input data set. In this paper, we demonstrate how compiler-based autotuning that incorporates specialization for expected data set sizes of key computations can be used to speed up Nek5000, a spectral-element code. Nek5000 makes heavy use of what are effectively Basic Linear Algebra Subroutine (BLAS) calls, but for very small matrices. Through autotuning and specialization, we can achieve significant performance gains over hand-tuned libraries (e.g., Goto, ATLAS, and ACML BLAS). Additional performance gains are obtained from using higher-level compiler optimizations mat aggregate multiple BLAS calls. We demonstrate more than 2.2X performance gains on an Opteron over the original manually tuned implementation, and speedups of up to 1.26X on the entire application running on 256 nodes of the Cray XT5 Jaguar system at Oak Ridge.
机译:自动调谐技术最近作为一种系统过程出现,用于评估计算的替代实现,以便为特定体系结构选择性能最佳的解决方案。专业化优化了针对特定类别的输入数据集定制的代码。在本文中,我们演示了如何使用基于编译器的自动调整功能(结合了针对关键计算的预期数据集大小的专业化功能)来加速Nek5000(一种频谱元素代码)。 Nek5000大量使用了有效的基本线性代数子例程(BLAS)调用,但仅用于非常小的矩阵。通过自动调整和专业化,与手工调整的库(例如Goto,ATLAS和ACML BLAS)相比,我们可以获得显着的性能提升。通过使用更高级别的编译器优化以及多个BLAS调用的聚合,可以提高性能。我们证明,在Opteron上,与原始手动调整的实现相比,性能提高了2.2倍以上,并且在Oak Ridge的Cray XT5 Jaguar系统的256个节点上运行的整个应用程序,速度提高了1.26倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号