首页> 外文会议>International Conference for High Performance Computing, Networking, Storage and Analysis >Parallelization of Reordering Algorithms for Bandwidth and Wavefront Reduction
【24h】

Parallelization of Reordering Algorithms for Bandwidth and Wavefront Reduction

机译:带宽和波前减少的重排序算法的并行化

获取原文

摘要

Many sparse matrix computations can be speeded up if the matrix is first reordered. Reordering was originally developed for direct methods but it has recently become popular for improving the cache locality of parallel iterative solvers since reordering the matrix to reduce bandwidth and wave front can improve the locality of reference of sparse matrix-vector multiplication (SpMV), the key kernel in iterative solvers. In this paper, we present the first parallel implementations of two widely used reordering algorithms: Reverse Cut hill-McKee (RCM) and Sloan. On 16 cores of the Stampede supercomputer, our parallel RCM is 5.56 times faster on the average than a state-of-the-art sequential implementation of RCM in the HSL library. Sloan is significantly more constrained than RCM, but our parallel implementation achieves a speedup of 2.88X on the average over sequential HSL-Sloan. Reordering the matrix using our parallel RCM and then performing 100 SpMV iterations is twice as fast as using HSL-RCM and then performing the SpMV iterations, it is also 1.5 times faster than performing the SpMV iterations without reordering the matrix.
机译:如果首先对矩阵进行重新排序,则可以加快许多稀疏矩阵的计算速度。重新排序最初是为直接方法开发的,但是最近由于改进矩阵以减少带宽和波前可以改善稀疏矩阵向量乘积(SpMV)的参考位置,因此改进并行迭代求解器的缓存局部性已变得很流行。迭代求解器中的内核。在本文中,我们介绍了两种广泛使用的重排序算法的第一个并行实现:反向切希尔-麦基(RCM)和斯隆(Sloan)。在Stampede超级计算机的16个内核上,我们的并行RCM平均比HSL库中最新的RCM顺序实现快5.56倍。与RCM相比,Sloan的约束明显更多,但我们的并行实现比连续HSL-Sloan的平均速度提高了2.88倍。使用并行RCM对矩阵进行重新排序,然后执行100个SpMV迭代的速度是使用HSL-RCM然后执行SpMV迭代的速度的两倍,这比不对矩阵进行重新排序的SpMV迭代的速度快1.5倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号