首页> 外文期刊>ACM transactions on mathematical software >Implementing High-performance Complex Matrix Multiplication via the 3m and 4m Methods
【24h】

Implementing High-performance Complex Matrix Multiplication via the 3m and 4m Methods

机译:通过3m和4m方法实现高性能复数矩阵乘法

获取原文
获取原文并翻译 | 示例
       

摘要

In this article, we explore the implementation of complex matrix multiplication. We begin by briefly identifying various challenges associated with the conventional approach, which calls for a carefully written kernel that implements complex arithmetic at the lowest possible level (i.e., assembly language). We then set out to develop a method of complex matrix multiplication that avoids the need for complex kernels altogether. This constraint promotes code reuse and portability within libraries such as Basic Linear Algebra Subprograms and BLAS-Like Library Instantiation Software (BLIS) and allows kernel developers to focus their efforts on fewer and simpler kernels. We develop two alternative approaches-one based on the 3M method and one that reflects the classic 4M formulation-each with multiple variants, all of which rely only on real matrix multiplication kernels. We discuss the performance characteristics of these "induced" methods and observe that the assembly-level method actually resides along the 4M spectrum of algorithmic variants. Implementations are developed within the BLIS framework, and testing on modern hardware confirms that while the less numerically stable 3M method yields the fastest runtimes, the more stable (and thus widely applicable) 4M method's performance is somewhat limited due to implementation challenges that appear inherent in nature.
机译:在本文中,我们探讨了复数矩阵乘法的实现。我们首先简要地确定与常规方法相关的各种挑战,这需要精心编写的内核,该内核必须以最低的级别(即汇编语言)实现复杂的算术。然后,我们着手开发一种复杂矩阵乘法的方法,该方法完全不需要复杂内核。这种限制促进了代码在诸如基本线性代数子程序和类似于BLAS的库实例化软件(BLIS)之类的库中的重用性和可移植性,并使内核开发人员可以将精力集中在更少和更简单的内核上。我们开发了两种替代方法-一种基于3M方法,另一种方法反映了经典的4M公式-每个方法都具有多种变体,所有这些变体仅依赖于真实的矩阵乘法内核。我们讨论了这些“诱导”方法的性能特征,并观察到装配级方法实际上位于算法变体的4M范围内。实现是在BLIS框架内开发的,并且在现代硬件上进行的测试证实,虽然数值上不太稳定的3M方法产生最快的运行时间,但由于实现中固有的实现挑战,4M方法的性能更加稳定(因此广泛适用)。性质。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号