Parallelization of Reordering Algorithms for Bandwidth and Wavefront Reduction

机译：带宽和波前减少的重排序算法的并行化

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Many sparse matrix computations can be speeded up if the matrix is first reordered. Reordering was originally developed for direct methods but it has recently become popular for improving the cache locality of parallel iterative solvers since reordering the matrix to reduce bandwidth and wave front can improve the locality of reference of sparse matrix-vector multiplication (SpMV), the key kernel in iterative solvers. In this paper, we present the first parallel implementations of two widely used reordering algorithms: Reverse Cut hill-McKee (RCM) and Sloan. On 16 cores of the Stampede supercomputer, our parallel RCM is 5.56 times faster on the average than a state-of-the-art sequential implementation of RCM in the HSL library. Sloan is significantly more constrained than RCM, but our parallel implementation achieves a speedup of 2.88X on the average over sequential HSL-Sloan. Reordering the matrix using our parallel RCM and then performing 100 SpMV iterations is twice as fast as using HSL-RCM and then performing the SpMV iterations, it is also 1.5 times faster than performing the SpMV iterations without reordering the matrix.

机译：如果首先对矩阵进行重新排序，则可以加快许多稀疏矩阵的计算速度。重新排序最初是为直接方法开发的，但是最近由于改进矩阵以减少带宽和波前可以改善稀疏矩阵向量乘积（SpMV）的参考位置，因此改进并行迭代求解器的缓存局部性已变得很流行。迭代求解器中的内核。在本文中，我们介绍了两种广泛使用的重排序算法的第一个并行实现：反向切希尔-麦基（RCM）和斯隆（Sloan）。在Stampede超级计算机的16个内核上，我们的并行RCM平均比HSL库中最新的RCM顺序实现快5.56倍。与RCM相比，Sloan的约束明显更多，但我们的并行实现比连续HSL-Sloan的平均速度提高了2.88倍。使用并行RCM对矩阵进行重新排序，然后执行100个SpMV迭代的速度是使用HSL-RCM然后执行SpMV迭代的速度的两倍，这比不对矩阵进行重新排序的SpMV迭代的速度快1.5倍。

著录项

来源
《International Conference for High Performance Computing, Networking, Storage and Analysis》|2014年|921-932|共12页
会议地点
作者
Karantasis Konstantinos /I/.; Lenharth Andrew; Nguyen Donald; Garzaran Mara J.; Pingali Keshav;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
cache storage; iterative methods; matrix multiplication; parallel algorithms; parallel machines; sparse matrices; HSL library; HSL-RCM; SpMV iteration; Stampede supercomputer; bandwidth reduction; cache locality; matrix reordering; parallel RCM; parallel implementation; parallel iterative solver; parallelization; reordering algorithm; reverse cut hill-McKee; sequential HSL-Sloan; sparse matrix computation; sparse matrix-vector multiplication; wavefront reduction; Arrays; Bandwidth; Heuristic algorithms; Indexes; Parallel processing; Runtime; Sparse matrices;

机译：缓存存储;迭代方法;矩阵乘法;并行算法;并行机;稀疏矩阵; HSL库; HSL-RCM; SpMV迭代; Stampede超级计算机;带宽减少;缓存局部性;矩阵重新排序;并行RCM;并行实现;并行迭代求解器;并行化;重排序算法;逆切Hill-McKee;顺序HSL-Sloan;稀疏矩阵计算;稀疏矩阵矢量乘法;波阵面缩减;阵列;带宽;启发式算法;索引;并行处理;运行时;稀疏矩阵;

相似文献

外文文献
中文文献
专利

1. Parallel Implementations of RCM Algorithm for Bandwidth Reduction of Sparse Matrices [J] . TEMA (So Carlos) . 2017,第3期

机译：稀疏矩阵带宽减小的RCM算法的并行实现
2. Cache-Oblivious Wavefront: Improving Parallelism of Recursive Dynamic Programming Algorithms without Losing Cache-Efficiency [J] . Tang Yuan, You Ronghui, Kan Haibin, ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2015,第8期

机译：高速缓存不可忽略的波前：在不损失高速缓存效率的情况下提高递归动态编程算法的并行性
3. Comparison of several stochastic parallel optimization algorithms for adaptive optics system without a wavefront sensor [J] . Yang H., Li X. Optics & Laser Technology . 2011,第3期

机译：无波前传感器的自适应光学系统几种随机并行优化算法的比较
4. A network-layer proxy for bandwidth aggregation and reduction of IP packet reordering [C] . Evensen Kristian, Kaspar Dominik, Engelstad Paal, Local Computer Networks, 2009. LCN 2009 . 2009

机译：网络层代理，用于带宽聚合和减少IP数据包重新排序
5. Parallelization of hyperspectral imaging classification and dimensionality reduction algorithms. [D] . Lugo-Beauchamp, Wilfredo E. 2004

机译：高光谱成像分类的并行化和降维算法。
6. Compression of genomic sequencing reads via hash-based reordering: algorithm and analysis [O] . Shubham Chandak, Kedar Tatwawadi, Tsachy Weissman -1

机译：通过基于哈希的重排序来压缩基因组测序读取：算法和分析
7. Parallelization of Reordering Algorithms for Bandwidth and Wavefront Reduction [O] . Konstantinos I. Karantasis, Andrew Lenharthy, Donald Nguyenz, 2014

机译：带宽和波前减少的重排序算法的并行化

Parallelization of Reordering Algorithms for Bandwidth and Wavefront Reduction

摘要

著录项

相似文献

相关主题

期刊订阅