Optimizing Sparse Matrix Vector Multiplication Using Cache Blocking Method on Fermi GPU

机译：在Fermi GPU上使用缓存阻止方法优化稀疏矩阵矢量乘法

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

It is an important task to tune performance for sparse matrix vector multiplication (SpMV), but it is also a difficult task because of its irregularity. In this paper, we propose a cache blocking method to improve the performance of SpMV on the emerging GPU architecture. The sparse matrix is partitioned into many sub-blocks, which are stored in CSR format. With the blocking method, the corresponding part of vector x can be reused in the GPU cache, so the time spent on accessing the global memory for vector x is reduced heavily. Experimental results on GeForce GTX 480 show that SpMV kernel with the cache blocking method is 5x faster than the unblocked CSR kernel in the best case.

机译：调整稀疏矩阵矢量乘法（SpMV）的性能是一项重要任务，但由于其不规则性，这也是一项艰巨的任务。在本文中，我们提出了一种缓存阻止方法来提高SpMV在新兴GPU架构上的性能。稀疏矩阵被分成许多子块，这些子块以CSR格式存储。使用分块方法，向量x的相应部分可以在GPU缓存中重用，因此可以大大减少向量x的全局存储器访问时间。在GeForce GTX 480上的实验结果表明，在最佳情况下，采用缓存阻止方法的SpMV内核比未阻止的CSR内核快5倍。

著录项

来源
《2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel amp; Distributed Computing》|2012年|p.231- 235|共5页
会议地点 Kyoto(JP)
作者
Xu Weizhi; Zhang Hao; Jiao Shuai; Wang Da; Song Fenglong; Liu Zhiyong;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类软件工程;人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. Adaptive Multi-level Blocking Optimization for Sparse Matrix Vector Multiplication on GPU [J] . Yusuke Nagasaka, Akira Nukada, Satoshi Matsuoka Procedia Computer Science . 2016,第1期

机译：GPU上稀疏矩阵矢量乘法的自适应多级阻塞优化
2. A model-driven blocking strategy for load balanced sparse matrix-vector multiplication on GPUs [J] . Arash Ashari, Naser Sedaghati, John Eisenlohr, Journal of Parallel and Distributed Computing . 2015,第feba期

机译：GPU上负载均衡的稀疏矩阵矢量乘法的模型驱动的阻塞策略
3. Cache-Oblivious Sparse Matrix--Vector Multiplication by Using Sparse Matrix Partitioning Methods [J] . A. N. Yzelman., Rob H. Bisseling. SIAM Journal on Scientific Computing . 2010,第4期

机译：高速缓存不可忽略的稀疏矩阵-使用稀疏矩阵划分方法的矢量乘法
4. Optimizing Sparse Matrix Vector Multiplication Using Cache Blocking Method on Fermi GPU [C] . Xu Weizhi, Zhang Hao, Jiao Shuai, ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel Distributed Computing . 2012

机译：优化Fermi GPU上的缓存阻塞方法优化稀疏矩阵矢量乘法
5. Optimizing the performance of sparse matrix -vector multiplication [D] . Im, Eun-Jin 2000

机译：优化稀疏矩阵-矢量乘法的性能
6. Computing the sparse matrix vector product using block-based kernels without zero padding on processors with AVX-512 instructions [O] . Bérenger Bramas, Pavel Kus 2018

机译：使用AVX-512指令的处理器上没有零填充的基于块的内核计算稀疏矩阵矢量产品
7. Adaptive Multi-level Blocking Optimization for Sparse Matrix Vector Multiplication on GPU [O] . Nagasaka Yusuke, Nukada Akira, Matsuoka Satoshi 2016

机译：GPU上稀疏矩阵矢量乘法的自适应多级阻塞优化

Optimizing Sparse Matrix Vector Multiplication Using Cache Blocking Method on Fermi GPU

摘要

著录项

相似文献

相关主题

期刊订阅