...
首页> 外文期刊>Journal of Real-Time Image Processing >Custom Parallel Caching Schemes For Hardware-accelerated Image Compression
【24h】

Custom Parallel Caching Schemes For Hardware-accelerated Image Compression

机译:用于硬件加速图像压缩的自定义并行缓存方案

获取原文
获取原文并翻译 | 示例

摘要

In an effort to achieve lower bandwidth requirements, video compression algorithms have become increasingly complex. Consequently, the deployment of these algorithms on field programmable gate arrays (FPGAs) is becoming increasingly desirable, because of the computational parallelism on these platforms as well as the measure of flexibility afforded to designers. Typically, video data are stored in large and slow external memory arrays, but the impact of the memory access bottleneck may be reduced by buffering frequently used data in fast on-chip memories. The order of the memory accesses, resulting from many compression algorithms are dependent on the input data (Jain in Proceedings of the IEEE, pp. 349-389, 1981). These data-dependent memory accesses complicate the exploitation of data re-use, and subsequently reduce the extent to which an application may be accelerated. In this paper, we present a hybrid memory sub-system which is able to capture data re-use effectively in spite of data-dependent memory accesses. This memory sub-system is made up of a custom parallel cache and a scratchpad memory. Further, the framework is capable of exploiting 2D spatial locality, which is frequently exhibited in the access patterns of image processing applications. In a case study involving the quad-tree structured pulse code modulation (QSDPCM) application, the impact of data dependence on memory accesses is shown to be significant. In comparison with an implementation which only employs an SPM, performance improvements of up to 1.7× and 1.4× are observed through actual implementation on two modern FPGA platforms. These performance improvements are more pronounced for image sequences exhibiting greater inter-frame movements. In addition, reductions of on-chip memory resources by up to 3.2× are achievable using this framework. These results indicate that, on custom hardware platforms, there is substantial scope for improvement in the capture of data re-use when memory accesses are data dependent.
机译:为了达到较低的带宽要求,视频压缩算法变得越来越复杂。因此,由于这些平台上的计算并行性以及提供给设计人员的灵活性,因此越来越需要将这些算法部署在现场可编程门阵列(FPGA)上。通常,视频数据存储在大而慢的外部存储器阵列中,但是可以通过将常用数据缓存在快速片上存储器中来减少存储器访问瓶颈的影响。由许多压缩算法得出的存储器访问顺序取决于输入数据(Jain in IEEE Proceedings of IEEE,第349-389页,1981年)。这些与数据相关的内存访问使数据重用的开发复杂化,并随后降低了应用程序可加速的程度。在本文中,我们提出了一种混合内存子系统,尽管依赖于数据的内存访问,该子系统仍能够有效捕获数据重用。该内存子系统由自定义并行缓存和暂存器内存组成。此外,该框架能够利用2D空间局部性,这经常出现在图像处理应用程序的访问模式中。在涉及四叉树结构的脉冲编码调制(QSDPCM)应用的案例研究中,数据依赖性对内存访问的影响被证明是巨大的。与仅采用SPM的实现相比,通过在两个现代FPGA平台上的实际实现,可以看到性能分别提高了1.7倍和1.4倍。对于表现出更大帧间运动的图像序列,这些性能改进更为明显。此外,使用该框架可将片上存储器资源减少多达3.2倍。这些结果表明,在定制硬件平台上,当内存访问与数据相关时,在捕获数据重用方面存在很大的改进空间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号