ARC 2014: Towards a Fast FPGA Implementation of a Heap-Based Priority Queue for Image Coding Using a Parallel Index-Aware Tree

Bai Yuhui; Ahmed Syed Zahid; Granado Bertrand

首页> 外文期刊>ACM transactions on reconfigurable technology and systems >ARC 2014: Towards a Fast FPGA Implementation of a Heap-Based Priority Queue for Image Coding Using a Parallel Index-Aware Tree

【24h】

ARC 2014: Towards a Fast FPGA Implementation of a Heap-Based Priority Queue for Image Coding Using a Parallel Index-Aware Tree

机译：ARC 2014：使用并行索引感知树实现基于堆的优先级队列的快速FPGA实现图像编码

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The embedded image processing systems like smartphones and digital cameras have tight limits on storage, computation power, network connectivity, and battery usage. These limitations make it important to ensure efficient image coding. In the article, we present a novel heap-based priority queue structure employed by an Adaptive Scanning of Wavelet Data scheme (ASWD) targeting an embedded platform. ASWD is a context modeling block implemented via priority queues in a wavelet-based image coder to reorganize the wavelet coefficients into locally stationary sequences. The architecture we propose exploits efficient use of FPGA's on-chip dual-port memories in an adaptive manner. Innovations of index-aware system linked to each element in the queue makes the location of queue element traceable in the heap as per the requirements of the ASWD algorithm. Moreover, use of 4-port memories along with intelligent data concatenation of queue elements yielded in a cost effective enhanced memory access. The memory ports are adaptively assigned to different units during different processing phases in a manner to optimally take advantage of memory access required by that phase. The architectural innovations can also be exploited in other applications that require efficient hardware implementations of generic priority queue or classical sorting applications which sort into the index. We designed and validated the hardware on an Altera's Stratix IV FPGA as an IP accelerator in a Nios II processor based System on Chip. We show that our architecture at 150MHz can provide 45X speedup compared to an embedded ARM Cortex-A9 processor at 666MHz targeting the throughput of 10MB/s.

机译：智能手机和数码相机等嵌入式图像处理系统对存储，计算能力，网络连接性和电池使用量都有严格的限制。这些限制使得确保有效的图像编码变得很重要。在本文中，我们提出了一种针对嵌入式平台的，基于小波数据自适应扫描方案（ASWD）的新颖的基于堆的优先级队列结构。 ASWD是一个上下文建模模块，通过基于小波的图像编码器中的优先级队列来实现，以将小波系数重新组织为局部固定序列。我们提出的架构利用自适应方式有效利用了FPGA的片上双端口存储器。链接到队列中每个元素的索引感知系统的创新使按ASWD算法的要求可在堆中跟踪队列元素的位置。此外，使用4端口存储器以及队列元素的智能数据串联可以提高成本效益，提高存储器访问效率。存储器端口在不同处理阶段期间以最佳利用该阶段所需的存储器访问的方式自适应地分配给不同单元。架构创新也可以在需要通用优先级队列的有效硬件实现的其他应用程序或分类到索引的经典排序应用程序中使用。我们设计并验证了Altera Stratix IV FPGA上的硬件，作为基于Nios II处理器的片上系统中的IP加速器。我们证明，与以666MHz为目标，吞吐量为10MB / s的嵌入式ARM Cortex-A9处理器相比，我们在150MHz的体系结构可以提供45倍的速度提高。

著录项

来源
《ACM transactions on reconfigurable technology and systems》 |2016年第1期|8.1-8.16|共16页
作者
Bai Yuhui; Ahmed Syed Zahid; Granado Bertrand;
展开▼
作者单位

Univ Cergy Pontoise ENSEA UMR CNRS 8051 ETIS Cergy France;

Univ Paris 06 Univ Paris 04 UMR7606 LIP6 F-75005 Paris France;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Design; Experimentation; Performance; Image compression; adaptive scanning; priority queue; heapsort; system-on-chip; embedded system; FPGA;

机译：设计;实验;性能;图像压缩;自适应扫描优先队列;堆排序片上系统;嵌入式系统;现场可编程门阵列;

相似文献

外文文献
中文文献
专利

1. A Heap-Based Concurrent Priority Queue with Mutable Priorities for Faster Parallel Algorithms [J] . Orr Tamir, Adam Morrison, Noam Rinetzky LIPIcs : Leibniz International Proceedings in Informatics . 2016,第4期

机译：具有可变优先级的基于堆的并发优先级队列，用于更快的并行算法
2. VHDL Design and FPGA Implementation of a Fully Parallel Architecture for Iterative Decoder of Majority Logic Codes for High Data Rate Applications [J] . M. El Haroussi, M. Belkasmi Journal of Wireless Networking and Communications . 2012,第4期

机译：用于高数据速率应用的多数逻辑代码迭代解码器的完全并行架构的VHDL设计和FPGA实现
3. DESIGN AND FPGA IMPLEMENTATION OF A FULLY PARALLEL ARCHITECTURE FOR TURBO DECODING OF MAJORITY LOGIC CODES FOR HIGH DATA RATE APPLICATIONS [J] . M. EL Haroussi, M. Belkasmi International journal of computing & information technology . 2011,第2期

机译：用于高数据率应用的大规模逻辑代码涡轮解码的全并行架构的设计和FPGA实现
4. Accelerating Heap-Based Priority Queue in Image Coding Application Using Parallel Index-Aware Tree Access [C] . Yuhui Bai, Syed Zahid Ahmed, Bertrand Granado International symposium on applied reconfigurable computing . 2014

机译：使用并行索引感知树访问在图像编码应用中加速基于堆的优先级队列
5. Implementation of parallel imaging techniques for lipid unaliasing and faster acquisition for improving spatial characterization of magnetic resonance spectroscopic imaging of gliomas. [D] . Ozturk Isik, Esin. 2007

机译：实现脂质去混叠和更快采集的并行成像技术，以改善神经胶质瘤磁共振波谱成像的空间特征。
6. Fully Parallel Implementation of Otsu Automatic Image Thresholding Algorithm on FPGA [O] . Wysterlânya K. P. Barros, Leonardo A. Dias, Marcelo A. C. Fernandes 2021

机译：FPGA上的OTSU自动图像阈值算法的完全平行实现
7. A Heap-Based Concurrent Priority Queue with Mutable Priorities for Faster Parallel Algorithms [O] . Tamir Orr, Morrison Adam, Rinetzky Noam 2016

机译：具有可变优先级的基于堆的并发优先级队列，用于更快的并行算法
8. Fast Parallel Tree Codes for Gravitational and Fluid Dynamical N-Body Problems [R] . Salmon, J. K., Warren, M. S., Winckelmans, G. S. 1993

机译：重力和流体动力N体问题的快速并行树码

ARC 2014: Towards a Fast FPGA Implementation of a Heap-Based Priority Queue for Image Coding Using a Parallel Index-Aware Tree

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅