首页> 外文期刊>Concurrency and computation: practice and experience >Chip-level and multi-node analysis of energy-optimized lattice Boltzmann CFD simulations
【24h】

Chip-level and multi-node analysis of energy-optimized lattice Boltzmann CFD simulations

机译:能量优化晶格玻尔兹曼CFD仿真的芯片级和多节点分析

获取原文
获取原文并翻译 | 示例
       

摘要

Memory-bound algorithms show complex performance and energy consumption behavior on multicore processors. We choose the lattice Boltzmann method on an Intel Sandy Bridge cluster as a prototype scenario to investigate if and how single-chip performance and power characteristics can be generalized to the highly parallel case. First, we perform an analysis of a sparse-lattice lattice Boltzmann method implementation for complex geometries. Using a single-core performance model, we predict the intra-chip saturation characteristics and the optimal operating point in terms of energy-to-solution as a function of implementation details, clock frequency, vectorization, and number of active cores per chip. We show that high single-core performance and a correct choice of the number of active cores per chip are the essential optimizations for the lowest energy-to-solution at minimal performance degradation. Then we extrapolate to the Message Passing Interface (MPI)-parallel level and quantify the energy-saving potential of various optimizations and execution modes, where we find these guidelines to be even more important, especially when communication overhead is non-negligible. In our setup, we could achieve energy savings of 35% in this case, compared with a naive approach. We also demonstrate that a simple non-reflective reduction of the clock speed leaves most of the energy-saving potential unused. Copyright © 2015 John Wiley & Sons, Ltd.
机译:内存绑定算法在多核处理器上显示出复杂的性能和能耗行为。我们选择Intel Sandy Bridge集群上的晶格Boltzmann方法作为原型方案,以研究是否以及如何将单芯片性能和功耗特性推广到高度并行的情况。首先,我们对复杂几何形状的稀疏晶格Boltzmann方法实现进行了分析。使用单核性能模型,我们可以根据实现细节,时钟频率,矢量化以及每个芯片的活动核数来预测芯片内部的饱和特性和最佳的能量转换能量。我们证明了高单核性能和正确选择每个芯片的活动核数是在将性能降到最低的同时实现最低能耗的解决方案的基本优化。然后,我们推断出消息传递接口(MPI)的并行级别,并量化各种优化和执行模式的节能潜力,在这些地方,我们发现这些准则甚至更加重要,尤其是在通信开销不可忽略的情况下。与单纯的方法相比,在这种情况下,在这种情况下,我们可以节省35%的能源。我们还证明,时钟速度的简单非反射式降低会浪费大部分节能潜力。版权所有©2015 John Wiley&Sons,Ltd.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号