...
首页> 外文期刊>ACM transactions on reconfigurable technology and systems >Domain-Specific Optimization of Signal Recognition Targeting FPGAs
【24h】

Domain-Specific Optimization of Signal Recognition Targeting FPGAs

机译:针对FPGA的信号识别领域特定优化

获取原文
获取原文并翻译 | 示例

摘要

Domain-specific optimizations on matrix computations exploiting specific arithmetic and matrix representation formats have achieved significant performance/area gains in Field-Programmable Gate Array (FPGA) hardware designs. In this article, we explore the application of data-driven optimizations to reduce both storage and computation requirements to the problem of signal recognition from a known dictionary. By starting with a high-level mathematical representation of a signal recognition problem, we perform optimizations across the layers of the system, exploiting mathematical structure to improve implementation efficiency. Specifically, we use Walsh wavelet packets in conjunction with a BestBasis algorithm to distinguish between spoken digits. The resulting transform matrices are quite sparse, and exhibit a rich algebraic structure that contains significant overlap across rows. As a consequence, dot-product computations of the transform matrix and signal vectors exhibit significant computation reuse, or repeated identical computations. We present an algorithm for identifying this computation reuse and scheduling of the row computations. We exploit this reuse to derive FPGA hardware implementations that reduce the amount of computation for an individual matrix by as much as 6.35 x and an average of 2x for a single dot-product unit. The implementation that exploits reuse achieves a 2x computation reduction compared to three concurrently-executing simpler accumulator units with the same aggregate design area and outperforms software implementations on high-end desktop personal computers.
机译:利用特定算术和矩阵表示格式的矩阵计算领域特定优化已在现场可编程门阵列(FPGA)硬件设计中获得了显着的性能/面积增加。在本文中,我们探索了数据驱动优化的应用,以减少存储和计算需求,从而减少了从已知字典中识别信号的问题。从信号识别问题的高级数学表示开始,我们跨系统各层执行优化,利用数学结构来提高实现效率。具体来说,我们结合使用Walsh小波包和BestBasis算法来区分语音数字。所得的变换矩阵非常稀疏,并且呈现出丰富的代数结构,该结构包含各行之间的显着重叠。结果,变换矩阵和信号矢量的点积计算表现出显着的计算重用性或重复的相同计算。我们提出了一种算法,用于识别此计算重用和行计算的调度。我们利用这种重用来获得FPGA硬件实现,这些实现将单个矩阵的计算量减少了6.35倍,单个点积单元的平均值减少了2倍。与具有相同总设计面积的三个同时执行的更简单累加器单元相比,利用重用的实现实现了2倍的计算减少,并且性能优于高端台式个人计算机上的软件实现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号