Threaded Accurate Matrix-Matrix Multiplications with Sparse Matrix-Vector Multiplications

机译：具有稀疏矩阵矢量乘法的线程精确矩阵矩阵乘法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Basic Linear Algebra Subprograms (BLAS) is a frequently used numerical library for linear algebra computations. However, it places little emphasis on computational accuracy, especially with respect to the accuracy assurance of the results. Although some algorithms for ensuring the computational accuracy of BLAS operations have been studied, there is a need for performance evaluation in advanced computer architectures. In this study, we parallelize high-precision matrix-matrix multiplication using thread-level parallelism. In addition, we conduct a performance evaluation from the viewpoints of execution speed and accuracy. We implement a method to convert dense matrices into sparse matrices by exploiting the nature of the target algorithm and adapting sparse-vector multiplication. Results obtained using the FX100 supercomputer system at Nagoya University indicate that (1) implementation with the ELL format achieves 1.43x speedup and (2) a maximum of 38x speedup compared to conventional implementation for dense matrix operations with dgemm.

机译：基本线性代数子程序（BLA）是用于线性代数计算的常用数值库。然而，它很少强调计算准确性，特别是关于结果的准确性保证。虽然已经研究了一些用于确保BLAS操作的计算准确性的算法，但需要在高级计算机架构中进行性能评估。在这项研究中，我们使用螺纹级并行性并行化高精度矩阵矩阵乘法。此外，我们从执行速度和准确性的角度进行性能评估。我们通过利用目标算法的性质来实现一种将密集矩阵转换为稀疏矩阵的方法，并采用稀疏 - 向量乘法。使用名古屋大学的FX100超级计算机系统获得的结果表明（1）与ELL格式的实现实现1.43倍的加速度和（2）与具有DGEMM的密集矩阵操作的传统实施相比，最大38倍的加速。

著录项

来源
《IEEE International Parallel and Distributed Processing Symposium Workshops》|2018年|665p|共10页
会议地点
作者
Shuntaro Ichimura; Takahiro Katagiri; Katsuhisa Ozaki; Takeshi Ogita; Toru Nagai;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.133;
关键词
Sparse matrices; Parallel processing; Distributed processing; Conferences; Instruction sets; Manganese;

机译：稀疏矩阵;并行处理;分布式处理;会议;指令集;锰;

相似文献

外文文献
中文文献
专利

1. GPU accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication [J] . Yuan Tao, Yangdong Deng, Shuai Mu, Concurrency and computation: practice and experience . 2015,第14期

机译：GPU加速的稀疏矩阵-向量乘法和稀疏矩阵-转置向量乘法
2. A sparse matrix-vector multiplication based algorithm for accurate density matrix computations on systems of millions of atoms [J] . Ghale Purnima, Johnson Harley T. Computer physics communications . 2018,第期

机译：基于稀疏的矩阵矢量乘法算法，用于数百万原子系统的精确密度矩阵计算
3. Accurate cross-architecture performance modeling for sparse matrix-vector multiplication (SpMV) on GPUs [J] . Ping Guo, Liqiang Wang Concurrency, practice and experience . 2015,第13期

机译：GPU上的稀疏矩阵矢量乘法（SpMV）的准确跨体系结构性能建模
4. Threaded Accurate Matrix-Matrix Multiplications with Sparse Matrix-Vector Multiplications [C] . Shuntaro Ichimura, Takahiro Katagiri, Katsuhisa Ozaki, IEEE International Parallel and Distributed Processing Symposium Workshops . 2018

机译：带稀疏矩阵-矢量乘法的线程化精确矩阵-矩阵乘法
5. Analysis of High Performance Sparse Matrix-Vector Multiplication for Small Finite Fields [D] . Lambert, Matthew A. 2020

机译：小型有限字段高性能稀疏矩阵矢量乘法分析
6. HIERARCHICAL ORTHOGONAL MATRIX GENERATION AND MATRIX-VECTOR MULTIPLICATIONS IN RIGID BODY SIMULATIONS [O] . FUHUI FANG, JINGFANG HUANG, GARY HUBER, -1

机译：刚体模拟中的正交正交矩阵生成和矩阵向量乘法
7. Multi-threaded Sparse Matrix-Matrix Multiplication for Many-Core and GPU Architectures [O] . Deveci, Mehmet, Trott, Christian, Rajamanickam, Sivasankaran 2018

机译：用于多核和GpU的多线程稀疏矩阵 - 矩阵乘法架构

Threaded Accurate Matrix-Matrix Multiplications with Sparse Matrix-Vector Multiplications

摘要

著录项

相似文献

相关主题

期刊订阅