Memory Efficient Modular VLSI Architecture for Highthroughput and Low-Latency Implementation of Multilevel Lifting 2-D DWT

Mohanty B. K.; Meher P. K.

首页> 外文期刊>Signal Processing, IEEE Transactions on >Memory Efficient Modular VLSI Architecture for Highthroughput and Low-Latency Implementation of Multilevel Lifting 2-D DWT

【24h】

Memory Efficient Modular VLSI Architecture for Highthroughput and Low-Latency Implementation of Multilevel Lifting 2-D DWT

机译：高效存储器的模块化VLSI架构，用于多级提升2-D DWT的高吞吐量和低延迟实现

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we present a modular and pipeline architecture for lifting-based multilevel 2-D DWT, without using line-buffer and frame-buffer. Overall area-delay product is reduced in the proposed design by appropriate partitioning and scheduling of the computation of individual decomposition-levels. The processing for different levels is performed by a cascaded pipeline structure to maximize the hardware utilization efficiency (HUE). Moreover, the proposed structure is scalable for high-throughput and area-constrained implementation. We have removed all the redundancies resulting from decimated wavelet filtering to maximize the HUE. The proposed design involves $L$ pyramid algorithm (PA) units and one recursive pyramid algorithm (RPA) unit, where $R=N/P$ , $L=lceil log_{4}Prceil$ and $P$ is the input block size, $M$ and $N$ , respectively, being the height and width of the image. The entire multilevel DWT is computed by the proposed structure in $MR$ cycles. The proposed structure has $O(8Rtimes 2^{L})$ cycles of output latency, which is very small compared to the latency of the existing structures. Interestingly, the proposed structure does not require any line-buffer or frame-buffer, unlike the existing folded structures which otherwise require a line-buffer of size $O(N)$ and frame-buffer of size $O(M/2times N-n-n/2)$ for multilevel 2-D computation. Instead of those buffers, the proposed structure involves only local registers and RAM of size $O(N)$. The saving of line-buffer and frame-buffer achieved by the proposed design is an important advantage, since the image size could very often be as large as 512 $times$ 512. From the simulation results we find that, the proposed scalable structure offers better slice-delay-product (SDP) for higher throughput of implementation since the on-chip memory of this structure remains almost unchanged with input block size. It has 17% less SDP than the best of the corresponding existing structures on average, for different input-block sizes and image sizes. It involves 1.92 times more transistors, but offers 12.2 times higher throughput and consumes 52% less power per output (PPO) compared to the other, on average for different input sizes.

机译：在本文中，我们提出了一种用于基于提升的多层2-D DWT的模块化和流水线架构，无需使用行缓冲区和帧缓冲区。通过适当划分和调度各个分解级别的计算，在建议的设计中减少了总的面积延迟积。通过级联流水线结构执行不同级别的处理，以最大程度地提高硬件利用效率（HUE）。而且，所提出的结构对于高通量和面积受限的实施是可扩展的。我们已消除了因抽取小波滤波而产生的所有冗余，以最大程度地提高HUE。提议的设计涉及$ L $金字塔算法（PA）单元和一个递归金字塔算法（RPA）单元，其中$ R = N / P $，$ L = lceil log_ {4} Prceil $和$ P $是输入块size，$ M $和$ N $分别是图像的高度和宽度。整个多级DWT由建议的结构以$ MR $个周期计算。所提出的结构具有$ O（8Rtimes 2 ^ {L}）$个周期的输出延迟，与现有结构的延迟相比，该周期很小。有趣的是，与现有的折叠结构不同，拟议的结构不需要任何行缓冲区或帧缓冲区，否则，折叠结构需要大小为$ O（N）$的行缓冲区和大小为$ O（M / 2×Nnn的帧缓冲区） / 2）$用于多层2-D计算。代替那些缓冲器，所提出的结构仅涉及本地寄存器和大小为$ O（N）$的RAM。通过设计方案实现的行缓冲器和帧缓冲器的节省是一个重要的优势，因为图像大小通常可以高达512 $乘以512。从仿真结果我们发现，提出的可伸缩结构提供了更好的切片延迟乘积（SDP），实现更高的实现吞吐量，因为这种结构的片内存储器在输入块大小方面几乎保持不变。对于不同的输入块大小和图像大小，它的SDP平均比同类最佳现有结构的SDP低17％。对于不同的输入大小，它平均比其他晶体管多1.92倍，但吞吐量却高出12.2倍，每输出功率（PPO）消耗的功率则比其他少52％。

著录项

来源
《Signal Processing, IEEE Transactions on 》 |2011年第5期| p.2072-2084| 共13页
作者
Mohanty B. K.; Meher P. K.;
展开▼
作者单位

Dept. of Electronics and Communication Engineering, Jaypee University of Engineering and Technology, Raghogarh, Guna, India;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
2-dimensional (2-D) DWT; Discrete wavelet transform (DWT); VLSI; lifting; systolic array;

机译：二维（2-D）DWT;离散小波变换（DWT）;VLSI;提升;脉动阵列;

相似文献

外文文献
中文文献
专利

1. Memory-Efficient VLSI Architecture for 2-D Integer Lifting-Based DWT using Interlaced Read Scan Algorithm [J] . Chih-Hsien Hsia, Jen-Shiun Chiang WSEAS Transactions on Circuits and Systems . 2007 ,第3期

机译：使用隔行读取扫描算法的基于二维整数提升的DWT的内存高效VLSI架构
2. Efficient-Block-Processing Parallel Architecture for Multilevel Lifting 2-D DWT [J] . Basant K. Mohanty, Anurag Mahajan Journal of Low Power Electronics . 2013 ,第1期

机译：多级提升二维DWT的高效块处理并行架构
3. An Efficient VLSI Architecture and FPGA Implementation of High-Speed and Low Power 2-D DWT for (9, 7) Wavelet Filter [J] . A. Mansouri, A. Ahaitouf, F. Abdi. International journal of computer science and network security . 2009 ,第3期

机译：用于（9，7）小波滤波器的高速，低功耗2-D DWT的高效VLSI架构和FPGA实现
4. Hardware efficient recursive VLSI architecture for multilevel lifting 2-D DWT [C] . Darji A.D., Trivedi Nisarg, Merchant S.N., ISCAS 2012;IEEE International Symposium on Circuits and Systems . 2012

机译：硬件高效的递归VLSI体系结构，用于多层提升二维DWT
5. VLSI design optimization for lifting scheme DWT. [D] . Li, Jian. 2005

机译：提升方案DWT的VLSI设计优化。
6. Real-time imaging with radial GRAPPA: Implementation on a Heterogeneous Architecture for Low-Latency Reconstructions [O] . Haris Saybasili, Daniel A. Herzka, Nicole Seiberlich, -1

机译：径向GRAPPA实时成像：在异构架构上实现低延迟重建
7. Multiple-lifting Scheme: Memory-efficient VLSI Implementation for Line-based 2-D DWT [O] . Chih-chi Cheng, Chao-tsung Huang, Liang-gee Chen 2013

机译：多重提升方案：基于行的二维DWT的内存高效VLSI实现

Memory Efficient Modular VLSI Architecture for Highthroughput and Low-Latency Implementation of Multilevel Lifting 2-D DWT

摘要

著录项

相似文献

相关主题

期刊订阅