首页> 外文OA文献 >Efficient architectures and power modelling of multiresolution analysis algorithms on FPGA
【2h】

Efficient architectures and power modelling of multiresolution analysis algorithms on FPGA

机译:FpGa上多分辨率分析算法的高效架构和功耗建模

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In the past two decades, there has been huge amount of interest in Multiresolution Analysis Algorithms (MAAs) and their applications. Processing some of their applications such as medical imaging are computationally intensive, power hungry and requires large amount of memory which cause a high demand for efficient algorithm implementation, low power architecture and acceleration. Recently, some MAAs such as Finite Ridgelet Transform (FRIT) Haar Wavelet Transform (HWT) are became very popular and they are suitable for a number of image processing applications such as detection of line singularities and contiguous edges, edge detection (useful for compression and feature detection), medical image denoising and segmentation. Efficient hardware implementation and acceleration of these algorithms particularly when addressing large problems are becoming very chal-lenging and consume lot of power which leads to a number of issues including mobility, reliability concerns. To overcome the computation problems, Field Programmable Gate Arrays (FPGAs) are the technology of choice for accelerating computationally intensive applications due to their high performance. Addressing the power issue requires optimi- sation and awareness at all level of abstractions in the design flow. The most important achievements of the work presented in this thesis are summarised here. Two factorisation methodologies for HWT which are called HWT Factorisation Method1 and (HWTFM1) and HWT Factorasation Method2 (HWTFM2) have been explored to increase number of zeros and reduce hardware resources. In addition, two novel efficient and optimised architectures for proposed methodologies based on Distributed Arithmetic (DA) principles have been proposed. The evaluation of the architectural results have shown that the proposed architectures results have reduced the arithmetics calculation (additions/subtractions) by 33% and 25% respectively compared to direct implementa-tion of HWT and outperformed existing results in place. The proposed HWTFM2 is implemented on advanced and low power FPGA devices using Handel-C language. The FPGAs implementation results have outperformed other existing results in terms of area and maximum frequency. In addition, a novel efficient architecture for Finite Radon Trans-form (FRAT) has also been proposed. The proposed architecture is integrated with the developed HWT architecture to build an optimised architecture for FRIT. Strategies such as parallelism and pipelining have been deployed at the architectural level for efficient im-plementation on different FPGA devices. The proposed FRIT architecture performance has been evaluated and the results outperformed some other existing architecture in place. Both FRAT and FRIT architectures have been implemented on FPGAs using Handel-C language. The evaluation of both architectures have shown that the obtained results out-performed existing results in place by almost 10% in terms of frequency and area. The proposed architectures are also applied on image data (256 £ 256) and their Peak Signal to Noise Ratio (PSNR) is evaluated for quality purposes. Two architectures for cyclic convolution based on systolic array using parallelism and pipelining which can be used as the main building block for the proposed FRIT architec-ture have been proposed. The first proposed architecture is a linear systolic array with pipelining process and the second architecture is a systolic array with parallel process. The second architecture reduces the number of registers by 42% compare to first architec-ture and both architectures outperformed other existing results in place. The proposed pipelined architecture has been implemented on different FPGA devices with vector size (N) 4,8,16,32 and word-length (W=8). The implementation results have shown a signifi-cant improvement and outperformed other existing results in place. Ultimately, an in-depth evaluation of a high level power macromodelling technique for design space exploration and characterisation of custom IP cores for FPGAs, called func-tional level power modelling approach have been presented. The mathematical techniques that form the basis of the proposed power modeling has been validated by a range of custom IP cores. The proposed power modelling is scalable, platform independent and compares favorably with existing approaches. A hybrid, top-down design flow paradigm integrating functional level power modelling with commercially available design tools for systematic optimisation of IP cores has also been developed. The in-depth evaluation of this tool enables us to observe the behavior of different custom IP cores in terms of power consumption and accuracy using different design methodologies and arithmetic techniques on virous FPGA platforms. Based on the results achieved, the proposed model accuracy is almost 99% true for all IP core's Dynamic Power (DP) components.
机译:在过去的二十年中,对多分辨率分析算法(MAA)及其应用引起了极大的兴趣。处理它们的某些应用程序(例如医学成像)需要大量的计算,且耗电,并且需要大量的内存,这导致对高效算法实现,低功耗架构和加速的需求很高。最近,诸如有限脊波变换(FRIT),Haar小波变换(HWT)之类的一些MAA变得非常流行,它们适用于许多图像处理应用,例如线奇异性和连续边缘的检测,边缘检测(用于压缩和压缩)。特征检测),医学图像降噪和分割。这些算法的高效硬件实现和加速,特别是在解决大问题时,变得非常艰巨,并消耗大量功率,这导致了许多问题,包括移动性,可靠性问题。为了克服计算问题,现场可编程门阵列(FPGA)的高性能是加速计算密集型应用程序的首选技术。解决电源问题需要在设计流程的所有抽象级别进行优化和了解。本文总结了本文工作的最重要成果。已经研究了两种用于HWT的分解方法,分别称为HWT分解方法1和(HWTFM1)和HWT分解方法2(HWTFM2),以增加零的数量并减少硬件资源。另外,针对基于分布式算术(DA)原理的所提出的方法,已经提出了两种新颖的,有效的和优化的架构。对体系结构结果的评估表明,与直接实施HWT相比,拟议的体系结构结果分别将算术计算(加/减)减少了33%和25%,并且胜过了现有的现有结果。拟议的HWTFM2使用Handel-C语言在高级和低功耗FPGA器件上实现。就面积和最大频率而言,FPGA实施结果优于其他现有结果。另外,还提出了一种新颖的用于有限Rad变换(FRAT)的有效架构。拟议的架构与已开发的HWT架构集成在一起,以构建针对FRIT的优化架构。诸如并行性和流水线之类的策略已在体系结构级别部署,以便在不同的FPGA器件上高效实现。已评估了拟议的FRIT架构性能,其结果优于现有的其他一些现有架构。 FRAT和FRIT架构均已使用Handel-C语言在FPGA上实现。对这两种架构的评估表明,就频率和面积而言,所获得的结果比已有的结果要好10%。提出的体系结构还应用于图像数据(256≤256),并且出于质量目的评估了其峰值信噪比(PSNR)。提出了两种基于脉动矩阵的并行和流水线循环卷积架构,它们可以用作所提出的FRIT体系结构的主要构建块。提出的第一个体系结构是具有流水线处理的线性脉动阵列,第二个体系结构是具有并行过程的脉动脉动阵列。与第一架构相比,第二架构将寄存器数量减少了42%,并且两种架构均胜过其他现有结果。所提出的流水线架构已在具有矢量大小(N)4,8,16,32和字长(W = 8)的不同FPGA器件上实现。实施结果已显示出重大改进,并且胜过其他现有结果。最终,提出了一种用于设计空间探索和表征FPGA的定制IP内核的高级功率宏建模技术的深入评估,称为功能级功率建模方法。构成建议的电源建模基础的数学技术已通过一系列定制IP内核进行了验证。所提出的功率建模是可扩展的,独立于平台的,并且与现有方法相比具有优势。还开发了一种混合的,自上而下的设计流程范例,将功能级别的功率建模与可商购的设计工具集成在一起,用于IP核的系统优化。对该工具的深入评估使我们能够在虚拟FPGA平台上使用不同的设计方法和算术技术,观察功耗和精度方面不同的定制IP内核的行为。根据获得的结果,对于所有IP内核的动态电源(DP)组件,建议的模型精度几乎达到了99%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号