首页> 外文期刊>Future generation computer systems >Efficient sparse matrix-vector multiplication using cache oblivious extension quadtree storage format
【24h】

Efficient sparse matrix-vector multiplication using cache oblivious extension quadtree storage format

机译:使用缓存遗忘扩展四叉树存储格式的有效稀疏矩阵矢量乘法

获取原文
获取原文并翻译 | 示例
       

摘要

In this paper, we elaborate on improving the sparse matrix storage format to optimize the data locality of sparse matrix-vector multiplication (SpMVM) algorithm, and its parallel performance. First of all, we propose a cache oblivious extension quadtree storage structure (COEQT), in which the sparse matrix is recursively divided into sub-regions that can well fit into cache to improve the data locality. Later on, we present a COEQT based SpMVM algorithm and optimize its performance through manual vectorization. With this storage format, the original SpMVM is divided into computations of relatively independent small matrices. In addition, this region-based computation framework is also suitable for high performance computing in distributed computing environment. So, we finally present a parallel SpMVM algorithm based on the proposed COEQT. Extensive and comprehensive experiments show that the sparse matrix-vector multiplication using the COEQT storage format achieves on average 1.1-1.5× speedup compared with CSR format and further higher performance through instruction level optimization techniques. The experiment in Lenovo Deepcomp 7000 demonstrates that this method achieves on average 1.63× speedup compared with the Intel Cluster Math Kernel Library implementation.
机译:在本文中,我们详细介绍了改进稀疏矩阵存储格式以优化稀疏矩阵矢量乘法(SpMVM)算法的数据局部性及其并行性能的方法。首先,我们提出了一种缓存遗忘扩展四叉树存储结构(COEQT),其中将稀疏矩阵递归划分为多个子区域,这些子区域可以很好地适合缓存以改善数据局部性。稍后,我们介绍基于COEQT的SpMVM算法,并通过手动矢量化优化其性能。使用这种存储格式,原始SpMVM被分为相对独立的小矩阵的计算。另外,这种基于区域的计算框架也适用于分布式计算环境中的高性能计算。因此,我们最终提出了一种基于提出的COEQT的并行SpMVM算法。广泛而全面的实验表明,与CSR格式相比,使用COEQT存储格式的稀疏矩阵矢量乘法平均可实现1.1-1.5倍的加速,并通过指令级优化技术进一步提高了性能。 Lenovo Deepcomp 7000中的实验表明,与英特尔集群数学内核库实现相比,该方法平均可实现1.63倍的加速。

著录项

  • 来源
    《Future generation computer systems》 |2016年第1期|490-500|共11页
  • 作者单位

    School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, 310018, China,Key Laboratory of Complex Systems Modeling and Simulation, Ministry of Education, China,Zhejiang Provincial Engineering Center on Media Data Cloud Processing and Analysis, Hangzhou, 310018, China;

    School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, 310018, China,Key Laboratory of Complex Systems Modeling and Simulation, Ministry of Education, China,Zhejiang Provincial Engineering Center on Media Data Cloud Processing and Analysis, Hangzhou, 310018, China;

    School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, 310018, China,Key Laboratory of Complex Systems Modeling and Simulation, Ministry of Education, China,Zhejiang Provincial Engineering Center on Media Data Cloud Processing and Analysis, Hangzhou, 310018, China;

    School of Mechanical Engineering, Hangzhou Dianzi University, Hangzhou, 310018, China;

    School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, 310018, China,Key Laboratory of Complex Systems Modeling and Simulation, Ministry of Education, China,Zhejiang Provincial Engineering Center on Media Data Cloud Processing and Analysis, Hangzhou, 310018, China;

    School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, 310018, China,Key Laboratory of Complex Systems Modeling and Simulation, Ministry of Education, China,Zhejiang Provincial Engineering Center on Media Data Cloud Processing and Analysis, Hangzhou, 310018, China;

    School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, 310018, China,Key Laboratory of Complex Systems Modeling and Simulation, Ministry of Education, China,Zhejiang Provincial Engineering Center on Media Data Cloud Processing and Analysis, Hangzhou, 310018, China;

    School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, 310018, China,Key Laboratory of Complex Systems Modeling and Simulation, Ministry of Education, China,Zhejiang Provincial Engineering Center on Media Data Cloud Processing and Analysis, Hangzhou, 310018, China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Sparse matrix-vector multiplication; Sparse matrix storage; Data locality; Cache oblivious; Extension quadtree; Distributed parallelism;

    机译:稀疏矩阵向量乘法;稀疏矩阵存储;数据局部性;缓存遗忘;扩展四叉树;分布式并行;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号