首页> 外文会议>International conference on algorithms and architectures for parallel processing >Accelerating Lattice Boltzmann Method by Fully Exposing Vectorizable Loops
【24h】

Accelerating Lattice Boltzmann Method by Fully Exposing Vectorizable Loops

机译:通过完全暴露矢量放大循环加速晶格Boltzmann方法

获取原文

摘要

Lattice Boltzmann Method (LBM) plays an important role in CFD applications. Accelerating LBM computation indicates the decrease of simulation costs for many industries. However, the loop-carried dependencies in LBM kernels prevent the vectorization of loops and general compilers therefore have missed many opportunities of vectorization. This paper proposes a SIMD-aware loop transformation algorithm to fully expose vectorizable loops for LBM kernels. The proposed algorithm identifies most potential vectorizable loops according to a defined dependence table. Then, it performs appropriate loop transformations and array copying techniques to legalize loop-carried dependencies and makes the identified loops automatically vectorized by compiler. Experiments carried on an Intel Xeon Gold 6140 server show that the proposed algorithm significantly raises the ratio of number of vectorized loops to number of all loops in LBM kernels. And our algorithm also achieves a better performance than an Intel C++ compiler and a polyhedral optimizer, accelerating LBM computation by 147% and 120% on average lattice update speed, respectively.
机译:格子Boltzmann方法(LBM)在CFD应用中起重要作用。加速LBM计算表明许多行业的模拟成本降低。然而,LBM内核中的循环携带的依赖性阻止环路的矢量化和一般编译器错过了许多矢量化的机会。本文提出了一种SIMD感知环路变换算法,用于完全公开LBM内核的矢量化环。该算法根据定义的依赖表识别大多数潜在的矢量化循环。然后,它执行适当的循环变换和阵列复制技术,以合法化循环携带的依赖性,并使所识别的循环自动由编译器传染。在英特尔Xeon Gold 6140服务器上运送的实验表明,该算法显着提高了矢量化循环数量与LBM内核中所有环路数量的比率。我们的算法还实现了比英特尔C ++编译器和多面型优化器更好的性能,分别将LBM计算加速147%和120%的平均晶格更新速度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号