Architecture and implementation of a vector/SIMD multiply-accumulate unit

Danysh A.; Tan D.

首页> 外文期刊>IEEE Transactions on Computers >Architecture and implementation of a vector/SIMD multiply-accumulate unit

【24h】

Architecture and implementation of a vector/SIMD multiply-accumulate unit

机译：向量/ SIMD乘法累加单元的体系结构和实现

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This work presents 64-bit fixed-point vector multiply-accumulator (MAC) architecture capable of supporting multiple precisions. The vector MAC can perform one 64/spl times/64, two 32/spl times/32, four 16/spl times/16, or eight 8/spl times/8 bit signed/unsigned multiply using essentially the same hardware as a scalar 64-bit MAC and with only a small increase in delay. The scalar MAC architecture is "vectorized" by inserting mode-dependent multiplexing into the partial product generation and by inserting mode-dependent kills in the carry chain of the reduction tree and the final carry-propagate adder. This is an example of "shared segmentation" in which the existing scalar structure is segmented and then shared between vector modes. The vector MAC is area efficient and can be fully pipelined, which makes it suitable for high-performance processors and, possibly, dynamically reconfigurable processors. The "shared segmentation" method is compared to an alternative method, referred to as the "shared subtree" method, by implementing vector MAC designs using two different technologies and three different vector widths.

机译：这项工作提出了能够支持多种精度的64位定点矢量乘法累加器（MAC）架构。向量MAC可以使用与标量基本相同的硬件执行一次64 / spl次/ 64，两个32 / spl次/ 32，四个16 / spl次/ 16或八个8 / spl次/ 8比特有符号/无符号乘法64位MAC，并且延迟增加很小。通过将依赖于模式的多路复用插入部分乘积生成中，以及通过将依赖于模式的终止信号插入到缩减树的进位链和最终的进位传播器加法器中，来“量化”标量MAC体系结构。这是“共享分段”的示例，其中将现有的标量结构进行分段，然后在向量模式之间共享。向量MAC具有区域高效性，并且可以完全流水线化，这使其适用于高性能处理器以及可能的动态可重新配置处理器。通过使用两种不同的技术和三种不同的矢量宽度实现矢量MAC设计，将“共享分段”方法与另一种方法（称为“共享子树”方法）进行了比较。

著录项

来源
《IEEE Transactions on Computers》 |2005年第3期|p.284-293|共10页
作者
Danysh A.; Tan D.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
adders; carry logic; parallel architectures; 64 bit; SIMD; VLSI; data-path design; final carry-propagate adder; fixed-point vector multiply-accumulator; high-speed arithmetic; multimedia; multiplier; reduction tree; shared segmentation;

机译：加法器;进位逻辑;并行体系结构;64位;SIMD;VLSI;数据路径设计;最终进位传播加法器;定点矢量乘法累加器;高速算术;多媒体;乘法器;归约树;共享分段;

相似文献

外文文献
中文文献
专利

1. A high-performance and low-power 32-bit multiply-accumulate unitwith single-instruction-multiple-data (SIMD) feature [J] . Yuyun Liao, Roberts D.B. IEEE Journal of Solid-State Circuits . 2002,第7期

机译：具有单指令多数据（SIMD）功能的高性能，低功耗32位乘法累加单元
2. Vectorization for SIMD Architectures with alignment constraints [J] . Eichenberger AE, Wu P, OBrien K ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2004,第6期

机译：具有对齐约束的SIMD架构的矢量化
3. Review and Benchmarking of Precision-Scalable Multiply-Accumulate Unit Architectures for Embedded Neural-Network Processing [J] . Camusy Vincent, Meiy Linyan, Enz Christian, Emerging and Selected Topics in Circuits and Systems, IEEE Journal on . 2019,第4期

机译：嵌入式神经网络处理精密可伸缩乘积单元架构的回顾与基准
4. A novel vector/SIMD multiply-accumulate unit based on reconfigurable booth array [C] . 2010 10th IEEE International Conference on Solid-State and Integrated Circuit Technology . 2010

机译：基于可重构展位阵列的新型矢量/ SIMD乘积单元
5. Implementation of JPEG image compression on the DSP-RAM single-chip SIMD architecture. [D] . Ai, Hua. 2003

机译：在DSP-RAM单芯片SIMD架构上实现JPEG图像压缩。
6. Learning from Health Information Exchange Technical Architecture and Implementation in Seven Beacon Communities [O] . Douglas B. McCarthy, Karen Propp, Alexander Cohen, -1

机译：从七个信标社区的健康信息交换技术架构和实施中学习
7. Stable Vector Operation implementations, using Intels SIMD Architecture [O] . 2018

机译：稳定的矢量操作实现，使用Intels SIMD架构

Architecture and implementation of a vector/SIMD multiply-accumulate unit

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅