首页> 外文会议>International conference on algorithms and architectures for parallel processing >Accelerating Lattice Boltzmann Method by Fully Exposing Vectorizable Loops
【24h】

Accelerating Lattice Boltzmann Method by Fully Exposing Vectorizable Loops

机译:通过充分暴露向量化环来加速格子Boltzmann方法

获取原文

摘要

Lattice Boltzmann Method (LBM) plays an important role in CFD applications. Accelerating LBM computation indicates the decrease of simulation costs for many industries. However, the loop-carried dependencies in LBM kernels prevent the vectorization of loops and general compilers therefore have missed many opportunities of vectorization. This paper proposes a SIMD-aware loop transformation algorithm to fully expose vectorizable loops for LBM kernels. The proposed algorithm identifies most potential vectorizable loops according to a defined dependence table. Then, it performs appropriate loop transformations and array copying techniques to legalize loop-carried dependencies and makes the identified loops automatically vectorized by compiler. Experiments carried on an Intel Xeon Gold 6140 server show that the proposed algorithm significantly raises the ratio of number of vectorized loops to number of all loops in LBM kernels. And our algorithm also achieves a better performance than an Intel C++ compiler and a polyhedral optimizer, accelerating LBM computation by 147% and 120% on average lattice update speed, respectively.
机译:格子玻尔兹曼方法(LBM)在CFD应用中起着重要作用。加快LBM计算表明许多行业的仿真成本降低了。但是,LBM内核中的循环承载依赖性阻止了循环的矢量化,因此通用编译器已经错过了许多矢量化的机会。本文提出了一种SIMD感知循环转换算法,以充分展示LBM内核的矢量化循环。所提出的算法根据定义的依赖表来识别最可能的矢量化循环。然后,它执行适当的循环转换和数组复制技术,以使循环承载的依赖关系合法化,并使所标识的循环由编译器自动向量化。在Intel Xeon Gold 6140服务器上进行的实验表明,该算法大大提高了LBM内核中矢量化循环数与所有循环数之比。而且,我们的算法还比Intel C ++编译器和多面体优化器具有更好的性能,分别使平均矩阵更新速度的LBM计算分别提高了147%和120%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号