...
首页> 外文期刊>Parallel Computing >Writing a performance-portable matrix multiplication
【24h】

Writing a performance-portable matrix multiplication

机译:编写性能便携式矩阵乘法

获取原文
获取原文并翻译 | 示例
           

摘要

There are several frameworks that, while providing functional portability of code across different platforms, do not automatically provide performance portability. As a consequence, programmers have to hand-tune the kernel codes for each device. The Heterogeneous Programming Library (HPL) is one of these libraries, but it has the interesting feature that the kernel codes, which implement the computation to be performed, are generated at run-time. This run-time code generation (RTCG) capability can be used, in conjunction with generic parameterized algorithms, to write performance-portable codes. In this paper we explain how these techniques can be applied to a matrix multiplication algorithm. The performance of our implementation is compared to two state-of-the-art adaptive implementations, clBLAS and ViennaCL, on four different platforms, achieving average speedups with respect to them of 1.74 and 1.44, respectively. (C) 2015 Elsevier B.V. All rights reserved.
机译:有几种框架可以提供跨不同平台的代码的功能可移植性,但不能自动提供性能可移植性。结果,程序员必须手动调整每个设备的内核代码。异构编程库(HPL)是这些库之一,但它具有有趣的功能,即在运行时生成实现要执行的计算的内核代码。此运行时代码生成(RTCG)功能可与通用参数化算法结合使用,以编写性能便携式代码。在本文中,我们解释了如何将这些技术应用于矩阵乘法算法。我们将实现的性能与在四个不同平台上的两种最新的自适应实现clBLAS和ViennaCL进行了比较,分别实现了1.74和1.44的平均加速。 (C)2015 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号