首页> 外文会议>IEEE Annual International Symposium on Field-Programmable Custom Computing Machines >A Network-on-Chip Based H.264 Video Decoder Prototype Implemented on FPGAs
【24h】

A Network-on-Chip Based H.264 Video Decoder Prototype Implemented on FPGAs

机译:在FPGA上实现的基于片上网络的H.264视频解码器原型

获取原文

摘要

We present a field programmable gate array (FPGA) based implementation of the H.264 video decoder algorithm. The novelty of our design is that the communication between the decoder modules is done using a network-on-chip (NoC). This makes our design scalable and easily integrated within larger future NoC based systems, where the same hardware platform can host other algorithms such as compression, filtering, etc. Our primary objective is to study the achievable performance with a NoC based H.264 decoder solution. The design process involves primarily three main steps. First, the H.264 algorithm is split into eight different partitions, which are implemented as individual processing elements (PEs). These processing elements are attached to the routers of the regular mesh NoC and include: network abstraction layer (NAL) parser and entropy decoder, frame buffer and integer motion, inverse quantization inverse transform, intra prediction, luma sub-pixel motion, chroma sub-pixel motion, deblocking filter, and display driver. These PEs are described in VHDL with the first two being executed on Nios II softcores. The network-on-chip was generated with the Connect tool from Carnegie Mellon University and integrated within the top level design entity. Second, we specify the location of each of the PEs inside the regular mesh NoC. Because we use eight PEs, the NoC architecture needs to be a 3x3 regular mesh topology. When we specify the location of the PEs inside the mesh topology (i.e., specify the router to which a particular PE is attached), we effectively solve what is called the NoC mapping problem. To do that, we use manual mapping, which is done intelligently based on information about the internal structure of the decoding algorithm. This helps to reduce the number of routers that packets must travel through the network. Finally, the entire project is synthesized, placed, and routed with Quartus Prime Standard Edition 16.1 tool. The final design is tested and verified on the DE4 development board, which uses Altera's Stratix IV GX FPGA chip. The performance of the implementation at the time of the submission is that to decode 100 frames takes 33 seconds for a frame size of 192x144 pixels and to decode 100 frames takes 56 seconds for a resolution of 320x240 pixels per frame. Documentation and source codes of the entire project will be released to the public domain. We hope that this will enable other researchers to easily replicate and compare results to ours and that it will encourage and facilitate further research in the areas of image processing, computer vision, and advanced VHDL design and FPGAs.
机译:我们提出了一种基于现场可编程门阵列(FPGA)的H.264视频解码器算法的实现。我们设计的新颖之处在于,解码器模块之间的通信是使用片上网络(NoC)完成的。这使我们的设计具有可扩展性,并且可以轻松集成到未来的基于NoC的更大系统中,在该系统中,相同的硬件平台可以承载其他算法,例如压缩,过滤等。我们的主要目标是研究基于NoC的H.264解码器解决方案可实现的性能。 。设计过程主要包括三个主要步骤。首先,H.264算法被分为八个不同的分区,这些分区被实现为单独的处理元素(PE)。这些处理元素连接到常规网格NoC的路由器,包括:网络抽象层(NAL)解析器和熵解码器,帧缓冲区和整数运动,逆量化逆变换,帧内预测,​​亮度子像素运动,色度子像素。像素运动,解块滤镜和显示驱动器。这些PE在VHDL中进行了描述,其中前两个在Nios II软核上执行。片上网络是由卡内基梅隆大学的Connect工具生成的,并集成在顶层设计实体中。其次,我们指定每个PE在常规网状NoC内的位置。因为我们使用八个PE,所以NoC架构需要是3x3的常规网格拓扑。当我们在网状拓扑中指定PE的位置时(即指定特定PE所连接的路由器),我们可以有效地解决所谓的NoC映射问题。为此,我们使用手动映射,它是基于有关解码算法内部结构的信息而智能完成的。这有助于减少数据包必须通过网络传输的路由器的数量。最后,使用Quartus Prime Standard Edition 16.1工具对整个项目进行综合,放置和布线。最终设计在DE4开发板上进行了测试和验证,该开发板上使用了Altera的Stratix IV GX FPGA芯片。提交时实施的性能是,对于192x144像素的帧,解码100帧需要33秒,对于每帧320x240像素的分辨率,解码100个帧需要56秒。整个项目的文档和源代码将发布到公共领域。我们希望这将使其他研究人员能够轻松地复制和比较我们的研究结果,并鼓励并促进在图像处理,计算机视觉以及高级VHDL设计和FPGA领域的进一步研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号