首页> 外文期刊>Signal Processing, IEEE Transactions on >Throughput-Distortion Computation of Generic Matrix Multiplication: Toward a Computation Channel for Digital Signal Processing Systems
【24h】

Throughput-Distortion Computation of Generic Matrix Multiplication: Toward a Computation Channel for Digital Signal Processing Systems

机译:通用矩阵乘法的吞吐量失真计算:面向数字信号处理系统的计算通道

获取原文
获取原文并翻译 | 示例

摘要

The generic matrix multiply (GEMM) function is the core element of high-performance linear algebra libraries used in many computationally demanding digital signal processing (DSP) systems. We propose an acceleration technique for GEMM based on dynamically adjusting the imprecision (distortion) of computation. Our technique employs adaptive scalar companding and rounding to input matrix blocks followed by two forms of packing in floating-point that allow for concurrent calculation of multiple results. Since the adaptive companding process controls the increase of concurrency (via packing), the increase in processing throughput (and the corresponding increase in distortion) depends on the input data statistics. To demonstrate this, we derive the optimal throughput-distortion control framework for GEMM for the broad class of zero-mean, independent identically distributed, input sources. Our approach converts matrix multiplication in programmable processors into a computation channel: when increasing the processing throughput, the output noise (error) increases due to: (i) coarser quantization; and (ii) computational errors caused by exceeding the machine-precision limitations. We show that, under certain distortion in the GEMM computation, the proposed framework can significantly surpass 100% of the peak performance of a given processor. The practical benefits of our proposal are shown in a face recognition system and a multilayer perceptron system trained for metadata learning from a large music feature database.
机译:通用矩阵乘法(GEMM)功能是许多计算要求很高的数字信号处理(DSP)系统中使用的高性能线性代数库的核心元素。我们基于动态调整计算的不精确度(失真)提出了GEMM的加速技术。我们的技术采用自适应标量压扩和舍入到输入矩阵块,然后采用两种形式的浮点打包,以允许同时计算多个结果。由于自适应压扩过程控制并发的增加(通过打包),因此处理吞吐量的增加(以及失真的相应增加)取决于输入数据的统计信息。为了证明这一点,我们为零均值,独立的均匀分布的输入源的广泛类推导了GEMM的最佳吞吐量失真控制框架。我们的方法将可编程处理器中的矩阵乘法转换为计算通道:当增加处理吞吐量时,由于以下因素而导致输出噪声(误差)增加:(i)较粗糙的量化; (ii)由于超出机器精度限制而导致的计算错误。我们表明,在GEMM计算中存在某些失真的情况下,所提出的框架可以大大超过给定处理器的峰值性能的100%。我们的建议的实际好处体现在面部识别系统和多层感知器系统中,这些系统经过训练可以从大型音乐特征数据库中进行元数据学习。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号