...
首页> 外文期刊>International Journal of High Performance Computing Applications >Optimizing the performance of streaming numerical kernels on the IBM Blue Gene/P PowerPC 450 processor
【24h】

Optimizing the performance of streaming numerical kernels on the IBM Blue Gene/P PowerPC 450 processor

机译:在IBM Blue Gene / P PowerPC 450处理器上优化流数字内核的性能

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Several emerging petascale architectures use energy-efficient processors with vectorized computational units and in-order thread processing. On these architectures the sustained performance of streaming numerical kernels, ubiquitous in the solution of partial differential equations, represents a challenge despite the regularity of memory access. Sophisticated optimization techniques are required to fully utilize the CPU. We propose a new method for constructing streaming numerical kernels using a high-level assembly synthesis and optimization framework. We describe an implementation of this method in Python targeting the IBM~? Blue Gene~?IP supercomputer's PowerPC~? 450 core. This paper details the high-level design, construction, simulation, verification, and analysis of these kernels utilizing a subset of the CPU's instruction set. We demonstrate the effectiveness of our approach by implementing several three-dimensional stencil kernels over a variety of cached memory scenarios and analyzing the mechanically scheduled variants, including a 27-point stencil achieving a l.7× speedup over the best previously published results.
机译:几种新兴的千万亿级架构使用具有矢量化计算单元和有序线程处理的节能处理器。在这些体系结构上,尽管存在内存访问的规律性,但在偏微分方程解中普遍存在的流数字内核的持续性能仍然是一个挑战。为了充分利用CPU,需要复杂的优化技术。我们提出了一种使用高级程序集综合和优化框架构造流数字内核的新方法。我们描述了针对IBM〜?的Python中此方法的实现。 Blue Gene〜?IP超级计算机的PowerPC〜? 450核。本文详细介绍了利用CPU指令集的子集对这些内核进行的高级设计,构造,仿真,验证和分析。我们通过在各种缓存的内存场景中实现几个三维模板内核并分析机械调度的变体(包括一个27点模板)来证明我们方法的有效性,其中27个点的模板实现了比以前公布的最佳结果高1.7倍的速度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号