首页> 外文期刊>IEEE Transactions on Computers >Architecture and implementation of a vector/SIMD multiply-accumulate unit
【24h】

Architecture and implementation of a vector/SIMD multiply-accumulate unit

机译:向量/ SIMD乘法累加单元的体系结构和实现

获取原文
获取原文并翻译 | 示例

摘要

This work presents 64-bit fixed-point vector multiply-accumulator (MAC) architecture capable of supporting multiple precisions. The vector MAC can perform one 64/spl times/64, two 32/spl times/32, four 16/spl times/16, or eight 8/spl times/8 bit signed/unsigned multiply using essentially the same hardware as a scalar 64-bit MAC and with only a small increase in delay. The scalar MAC architecture is "vectorized" by inserting mode-dependent multiplexing into the partial product generation and by inserting mode-dependent kills in the carry chain of the reduction tree and the final carry-propagate adder. This is an example of "shared segmentation" in which the existing scalar structure is segmented and then shared between vector modes. The vector MAC is area efficient and can be fully pipelined, which makes it suitable for high-performance processors and, possibly, dynamically reconfigurable processors. The "shared segmentation" method is compared to an alternative method, referred to as the "shared subtree" method, by implementing vector MAC designs using two different technologies and three different vector widths.
机译:这项工作提出了能够支持多种精度的64位定点矢量乘法累加器(MAC)架构。向量MAC可以使用与标量基本相同的硬件执行一次64 / spl次/ 64,两个32 / spl次/ 32,四个16 / spl次/ 16或八个8 / spl次/ 8比特有符号/无符号乘法64位MAC,并且延迟增加很小。通过将依赖于模式的多路复用插入部分乘积生成中,以及通过将依赖于模式的终止信号插入到缩减树的进位链和最终的进位传播器加法器中,来“量化”标量MAC体系结构。这是“共享分段”的示例,其中将现有的标量结构进行分段,然后在向量模式之间共享。向量MAC具有区域高效性,并且可以完全流水线化,这使其适用于高性能处理器以及可能的动态可重新配置处理器。通过使用两种不同的技术和三种不同的矢量宽度实现矢量MAC设计,将“共享分段”方法与另一种方法(称为“共享子树”方法)进行了比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号