首页> 外文期刊>International Journal of High Performance Computing Applications >Exploiting dense substructures for fast sparse matrix vector multiplication
【24h】

Exploiting dense substructures for fast sparse matrix vector multiplication

机译:利用密集子结构进行快速稀疏矩阵矢量乘法

获取原文
获取原文并翻译 | 示例
           

摘要

The execution time of many scientific computing applications is dominated by the time spent in performing sparse matrix vector multiplication (SMV; y←A·x). We consider improving the performance of SMV on multicores by exploiting the dense substructures that are inherently present in many sparse matrices derived from partial differential equation models. First, we identify indistinguishable vertices, i.e., vertices with the same adjacency structure, in a graph representation of the sparse matrix (A) and group them into a supernode. Next, we identify effectively dense blocks within the matrix by grouping rows and columns in each supernode. Finally, by using a suitable data structure for this representation of the matrix, we reduce the number of load operations during SMV while exactly preserving the original sparsity structure of A. In addition, we use ordering techniques to enhance locality in accesses to the vector, x, to yield an SMV kernel that exploits the effectively dense substructures in the matrix. We evaluate our scheme on Intel Nehalem and AMD Shanghai processors. We observe that for larger matrices on the Intel Nehalem processor, our method improves performance on average by 37.35% compared with the traditional compressed sparse row scheme (a blocked compressed form improves performance on average by 30.27%). Benefits of our new format are similar for the AMD processor. More importantly, if we pick for each matrix the best among our method and the blocked compressed scheme, the average performance improvements increase to 40.85%. Additional results indicate that the best performing scheme varies depending on the matrix and the system. We therefore propose an effective density measure that could be used for method selection, thus adding to the variety of options for an auto-tuned optimized SMV kernel that can exploit sparse matrix properties and hardware attributes for high performance.
机译:许多科学计算应用程序的执行时间主要由执行稀疏矩阵矢量乘法(SMV; y←A·x)所花费的时间决定。我们考虑通过利用从偏微分方程模型派生的许多稀疏矩阵中固有的密集子结构来提高SMV在多核上的性能。首先,我们在稀疏矩阵(A)的图形表示中确定不可区分的顶点,即具有相同邻接结构的顶点,并将它们分组为一个超节点。接下来,我们通过对每个超级节点中的行和列进行分组来在矩阵内有效地确定密集块。最后,通过使用合适的数据结构表示矩阵,我们在SMV期间减少了加载操作的次数,同时精确保留了A的原始稀疏结构。此外,我们使用排序技术来增强访问向量的局部性, x,得到一个SMV内核,该内核利用了矩阵中有效密集的子结构。我们在Intel Nehalem和AMD Shanghai处理器上评估我们的方案。我们观察到,对于Intel Nehalem处理器上的较大矩阵,与传统的压缩稀疏行方案相比,我们的方法平均将性能提高37.35%(阻塞压缩形式平均可将性能提高30.27%)。我们的新格式的优势与AMD处理器相似。更重要的是,如果我们为每个矩阵选择我们的方法和分块压缩方案中最好的矩阵,则平均性能提高到40.85%。其他结果表明,最佳性能方案取决于矩阵和系统。因此,我们提出了一种可用于方法选择的有效密度度量,从而为自动调谐的优化SMV内核增加了多种选择,这些内核可以利用稀疏矩阵属性和硬件属性来实现高性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号