A faster distributed arithmetic architecture for FPGAs

机译：FPGA的更快的分布式算术架构

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Distributed Arithmetic (DA) is an important technique to implement digital signal processing (DSP) functions in FPGAs. However, traditional lookup table (LUT) based DA architectures contain one or more carry propagation chains in the critical path that dictates the fastest time at which an entire design can run. In this paper, we describe a novel technique that can reduce or eliminate the carry-propagate chain from the critical path in LUT based DA architectures on FPGAs. In the proposed scheme, the individual bits of a word do not have to be processed as a unit. Instead, the current iteration can start as soon as the least significant bit (LSB) of the previous iteration is available, without waiting for the entire word from the previous iteration to be fully computed. This technique has great potential in speeding up DSP applications based on DA. Designs are described for serial and parallel DALUT and accumulator structures in which an n-bit carry chain, where n is the word length, is broken intosmaller r-bit chains, 1*nnr n . A cost-performance analysis of the designs is presented. The analysis shows that the designs proposed in this paper have a lower cost-performance ratio (indicating better performance) than traditional DA designs. We also show that the 8-bit (r = 8) designs offer a good compromise between cost and performance. The implementation is on a Xilinx chip XC4028XL-3-BG256 using Xilinx Foundation tools v 3.1i. The results show that the proposed designs can achieve speedup by a factor of at least 1.5 over traditional DA designs in some cases.

机译：分布式算法（DA）是在FPGA中实现数字信号处理（DSP）功能的一项重要技术。但是，基于传统查找表（LUT）的DA架构在关键路径中包含一个或多个进位传播链，这决定了整个设计可以以最快的速度运行。在本文中，我们描述了一种新技术，该技术可以减少或消除FPGA上基于LUT的DA架构中关键路径的进位传播链。在提出的方案中，单词的各个位不必作为一个单元进行处理。取而代之的是，当前迭代可以在上一次迭代的最低有效位（LSB）可用时立即开始，而不必等待前一次迭代的整个字都被完全计算出来。该技术在加速基于DA的DSP应用方面具有巨大潜力。描述了针对串行和并行DALUT和累加器结构的设计，其中n位进位链（其中n是字长）被分解为较小的r位链，即1 * nn r < n 。介绍了设计的成本效益分析。分析表明，与传统的DA设计相比，本文提出的设计具有较低的性价比（表明性能更高）。我们还表明，8位（ r = 8）设计在成本和性能之间提供了很好的折衷方案。该实现是在使用Xilinx Foundation工具v 3.1i的Xilinx芯片XC4028XL-3-BG256上实现的。结果表明，在某些情况下，与传统的DA设计相比，所提出的设计可以实现至少1.5倍的加速。 展开▼

著录项

来源
《Proceedings of the 2002 ACM/SIGDA tenth international symposium on Field-programmable gate arrays》|2002年|P.31-39|共9页

会议地点 Monterey CA(US)

作者
Radhika S. Grover; Weijia Shang; Qiang Li;
展开▼

作者单位

Santa Clara University, CA;

展开▼

会议组织

原文格式 PDF

正文语种 eng

中图分类计算技术、计算机技术;

关键词
distributed arithmetic;

机译：分布式算术;

相似文献

外文文献

中文文献

专利

1. Review on FPGA Implementation of 3D Distributed Arithmetic based DWT Architecture for Image Processing Applications [J] . Sukumar Beeda, S. Chandra Mohan Reddy Journal of Engineering & Applied Sciences . 2018,第21期

机译：用于图像处理应用的3D分布式算术基于DWT架构FPGA实现的综述

2. FPGA realization of FIR filters for high-speed and medium-speed by using modified distributed arithmetic architectures [J] . Jiafeng Xie, Jianjun He, Guanzheng Tan Microelectronics journal . 2010,第6期

机译：使用改进的分布式算术架构的高速和中速FIR滤波器的FPGA实现

3. Two-Symbol FPGA Architecture for Fast Arithmetic Encoding in JPEG 2000 [J] . Nandini Ramesh Kumar, Wei Xiang, Yafeng Wang Journal of signal processing systems for signal, image, and video technology . 2012,第2期

机译：JPEG 2000中用于快速算术编码的两符号FPGA体系结构

4. A faster distributed arithmetic architecture for FPGAs [C] . Radhika S. Grover, Weijia Shang, Qiang Li ACM/SIGDA tenth international symposium on Field-programmable gate arrays . 2002

机译：用于FPGA的分布式算术架构更快

5. Architectural optimizations and synthesis tools for improved energy efficiency and faster design closure for FPGAs. [D] . Mondal, Somsubhra. 2007

机译：架构优化和综合工具可提高能效并更快地完成FPGA的设计封闭。

6. A Scalable FPGA Architecture for Randomly Connected Networks of Hodgkin-Huxley Neurons [O] . Kaveh Akbarzadeh-Sherbaf, Behrooz Abdoli, Saeed Safari, 2018

机译：用于霍奇金-赫克斯利神经元随机连接网络的可扩展FPGA架构

7. A faster distributed arithmetic architecture for FPGAs [O] . Radhika S. Grover, Weijia Shang, Qiang Li 2002

机译：用于FpGa的更快的分布式算术架构

8. A real time correlator architecture using distributed arithmetic principles [R] . Premkumar, A. Benjamin, Srikanthan, T. 1992

机译：使用分布式算术原理的实时相关器架构

1. 分布式算术及其在FPGA中的实现 [J] . 吴东 ,王宇红 ,卢焕章 . 国防科技大学学报 . 2000,第003期

2. 集群交换架构带来更快的速度空口无凭,网络测试见真功夫——瞻博网络数据架构和交换技术集团产品市场和业务拓展副总裁Andy Ingram [J] . 瞻博网络 . 现代传输 . 2010,第005期

3. 基于HEVC的CABAC二进制算术编码器的FPGA实现 [J] . 王尧 ,汤心溢 . 红外技术 . 2020,第004期

4. 基于FPGA单精度浮点数算术运算系统的设计与仿真 [J] . 谢四雄 ,李克俭 ,蔡启仲 . 电子技术与软件工程 . 2018,第019期

5. 基于FPGA的256位CPU中定点算术逻辑器件的设计 [J] . 朱伟 . 科技创新与应用 . 2016,第021期

6. H.264中自适应二进制算术编码器的FPGA实现 [C] . 王小龙 ,许超 . 2011年(第九届)中国通信集成电路技术与应用研讨会暨中国通信学会通信专用集成电路委员会十周年年会 . 2011

7. HEVC中率失真算术编码与基于WPP的并行熵编码器的VLSI架构设计 [A] . 李姝仪 . 2017

1. 具有分布式算术架构的按需前馈均衡器和方法 [P] . 中国专利： CN111526104A . 2020-08-11

2. 一种基于FPGA的分布式文件系统架构 [P] . 中国专利： CN109962928A . 2019-07-02

3. Data bus width determination method for robot-controlled FPGA for arithmetic processing and FPGA for arithmetic processing [P] . 外国专利： JP2021056609A . 2021-04-08

机译：用于算术处理的机器人控制FPGA的数据总线宽度测定方法和算术处理的FPGA

4. GPU FPGA FPGA DEVICE FOR PERFORMING DISTRIBUTED PROCESSING FOR MULTIPLE GPUS AND METHOD FOR PERFORMING DISTRIBUTED PROCESSING USING THE SAME [P] . 外国专利： KR102309764B1 . 2021-10-08

机译：GPU FPGA FPGA器件用于对多个GPU进行分布式处理和用于使用相同执行分布式处理的方法

5. procedures for the design of field programmable gate arrays (fpgas) for dynamically configurable arithmetic [P] . 外国专利： DE69819046D1 . 2003-11-20

机译：动态可配置算法的现场可编程门阵列（fpgas）设计过程

相关主题

A faster distributed arithmetic architecture for FPGAs

摘要

著录项

相似文献

相关主题

期刊订阅