Writing a performance-portable matrix multiplication

Fabeiro Jorge F.; Andrade Diego; Fraguela Basilio B.

首页> 外文期刊>Parallel Computing >Writing a performance-portable matrix multiplication

【24h】

Writing a performance-portable matrix multiplication

机译：编写性能便携式矩阵乘法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

There are several frameworks that, while providing functional portability of code across different platforms, do not automatically provide performance portability. As a consequence, programmers have to hand-tune the kernel codes for each device. The Heterogeneous Programming Library (HPL) is one of these libraries, but it has the interesting feature that the kernel codes, which implement the computation to be performed, are generated at run-time. This run-time code generation (RTCG) capability can be used, in conjunction with generic parameterized algorithms, to write performance-portable codes. In this paper we explain how these techniques can be applied to a matrix multiplication algorithm. The performance of our implementation is compared to two state-of-the-art adaptive implementations, clBLAS and ViennaCL, on four different platforms, achieving average speedups with respect to them of 1.74 and 1.44, respectively. (C) 2015 Elsevier B.V. All rights reserved.

机译：有几种框架可以提供跨不同平台的代码的功能可移植性，但不能自动提供性能可移植性。结果，程序员必须手动调整每个设备的内核代码。异构编程库（HPL）是这些库之一，但它具有有趣的功能，即在运行时生成实现要执行的计算的内核代码。此运行时代码生成（RTCG）功能可与通用参数化算法结合使用，以编写性能便携式代码。在本文中，我们解释了如何将这些技术应用于矩阵乘法算法。我们将实现的性能与在四个不同平台上的两种最新的自适应实现clBLAS和ViennaCL进行了比较，分别实现了1.74和1.44的平均加速。（C）2015 Elsevier B.V.保留所有权利。

著录项

来源
《Parallel Computing》 |2016年第2期|65-77|共13页
作者
Fabeiro Jorge F.; Andrade Diego; Fraguela Basilio B.;
展开▼
作者单位

Univ A Coruna, Comp Architecture Grp, Coruna, Spain;

Univ A Coruna, Comp Architecture Grp, Coruna, Spain;

Univ A Coruna, Comp Architecture Grp, Coruna, Spain;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
GPGPU; Heterogeneous systems; OpenCL; Performance portability; Embedded languages;

机译：GPGPU;异构系统;OpenCL;性能可移植性;嵌入式语言;

相似文献

外文文献
中文文献
专利

1. Writing a performance-portable matrix multiplication [J] . Sergei Gorlatch Computing reviews . 2016,第8期

机译：编写性能便携式矩阵乘法
2. GPU accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication [J] . Yuan Tao, Yangdong Deng, Shuai Mu, Concurrency and computation: practice and experience . 2015,第14期

机译：GPU加速的稀疏矩阵-向量乘法和稀疏矩阵-转置向量乘法
3. Error-free transformations of matrix multiplication by using fast routines of matrix multiplication and its applications [J] . Katsuhisa Ozaki, Takeshi Ogita, Shin’ichi Oishi, Numerical Algorithms . 2012,第1期

机译：使用矩阵乘法的快速例程进行矩阵乘法的无错误转换及其应用
4. Performance-Portable Sparse Matrix-Matrix Multiplication for Many-Core Architectures [C] . Mehmet Deveci, Christian Trott, Sivasankaran Rajamanickam IEEE International Parallel and Distributed Processing Symposium Workshops . 2017

机译：用于多核架构的性能便携式稀疏矩阵矩阵乘法
5. Optimizing Tall-and-skinny Matrix-matrix Multiplication on GPUs [D] . Xiong, Nan 2018

机译：在GPU上优化高而瘦的矩阵矩阵乘法
6. HIERARCHICAL ORTHOGONAL MATRIX GENERATION AND MATRIX-VECTOR MULTIPLICATIONS IN RIGID BODY SIMULATIONS [O] . FUHUI FANG, JINGFANG HUANG, GARY HUBER, -1

机译：刚体模拟中的正交正交矩阵生成和矩阵向量乘法
7. Figure 4: An example of separate row multiplication matrix with uniform local binary pattern histogram (A) sample figure and matrix; (B) after multiplication; (C) and (D) separation of the matrices and (E) finally showing the filtered image. [O] . -1

机译：图4：具有均匀局部二进制模式直方图（A）样本图和矩阵的单独行乘法矩阵的示例; （b）繁殖后; （c）和（d）基质的分离，（e）最终显示过滤图像。

Writing a performance-portable matrix multiplication

摘要

著录项

相似文献

相关主题

期刊订阅