【24h】

A faster distributed arithmetic architecture for FPGAs

机译:FPGA的更快的分布式算术架构

获取原文
获取原文并翻译 | 示例

摘要

Distributed Arithmetic (DA) is an important technique to implement digital signal processing (DSP) functions in FPGAs. However, traditional lookup table (LUT) based DA architectures contain one or more carry propagation chains in the critical path that dictates the fastest time at which an entire design can run. In this paper, we describe a novel technique that can reduce or eliminate the carry-propagate chain from the critical path in LUT based DA architectures on FPGAs. In the proposed scheme, the individual bits of a word do not have to be processed as a unit. Instead, the current iteration can start as soon as the least significant bit (LSB) of the previous iteration is available, without waiting for the entire word from the previous iteration to be fully computed. This technique has great potential in speeding up DSP applications based on DA. Designs are described for serial and parallel DALUT and accumulator structures in which an n-bit carry chain, where n is the word length, is broken intosmaller r-bit chains, 1*nnr n . A cost-performance analysis of the designs is presented. The analysis shows that the designs proposed in this paper have a lower cost-performance ratio (indicating better performance) than traditional DA designs. We also show that the 8-bit (r = 8) designs offer a good compromise between cost and performance. The implementation is on a Xilinx chip XC4028XL-3-BG256 using Xilinx Foundation tools v 3.1i. The results show that the proposed designs can achieve speedup by a factor of at least 1.5 over traditional DA designs in some cases.
机译:分布式算法(DA)是在FPGA中实现数字信号处理(DSP)功能的一项重要技术。但是,基于传统查找表(LUT)的DA架构在关键路径中包含一个或多个进位传播链,这决定了整个设计可以以最快的速度运行。在本文中,我们描述了一种新技术,该技术可以减少或消除FPGA上基于LUT的DA架构中关键路径的进位传播链。在提出的方案中,单词的各个位不必作为一个单元进行处理。取而代之的是,当前迭代可以在上一次迭代的最低有效位(LSB)可用时立即开始,而不必等待前一次迭代的整个字都被完全计算出来。该技术在加速基于DA的DSP应用方面具有巨大潜力。描述了针对串行和并行DALUT和累加器结构的设计,其中n位进位链(其中n是字长)被分解为较小的r位链,即1 * nn r < n 。介绍了设计的成本效益分析。分析表明,与传统的DA设计相比,本文提出的设计具有较低的性价比(表明性能更高)。我们还表明,8位( r = 8)设计在成本和性能之间提供了很好的折衷方案。该实现是在使用Xilinx Foundation工具v 3.1i的Xilinx芯片XC4028XL-3-BG256上实现的。结果表明,在某些情况下,与传统的DA设计相比,所提出的设计可以实现至少1.5倍的加速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号