...
首页> 外文期刊>Concurrency and computation: practice and experience >Dynamic partitioning-based JPEG decompression on heterogeneous multicore architectures‡
【24h】

Dynamic partitioning-based JPEG decompression on heterogeneous multicore architectures‡

机译:异构多核体系结构上基于动态分区的JPEG解压缩‡

获取原文
获取原文并翻译 | 示例

摘要

With the emergence of social networks and improvements in computational photography, billions of JPEG images are shared and viewed on a daily basis. Desktops, tablets, and smartphones constitute the vast majority of hardware platforms used for displaying JPEG images. Despite the fact that these platforms are heterogeneous multicores, no approach exists yet that is capable of joining forces of a system's CPU and graphics processing unit (GPU) for JPEG decoding. In this paper, we introduce a novel JPEG decoding scheme for heterogeneous architectures consisting of a CPU and a general-purpose GPU. We employ an offline profiling step to determine the performance of a system's CPU and GPU with respect to JPEG decoding. For a given JPEG image, our performance model uses: (1)the CPU and GPU performance characteristics, (2)the image entropy, and (3)the width and height of the image to balance the JPEG decoding workload on the underlying hardware. Our run-time partitioning and scheduling scheme exploits task, data, and pipeline parallelism by scheduling the non-parallelizable entropy-decoding task on the CPU, whereas inverse discrete cosine transformations, color conversions, and upsampling are conducted on both the CPU and the GPU. We have implemented the proposed method in the context of the libjpeg-turbo library, which is an industrial-strength JPEG encoding and decoding engine. Libjpeg-turbo's hand-optimized SIMD routines for ARM and x86 architectures constitute a competitive yardstick for the comparison with the proposed approach. We have evaluated our approach for a total of 7194JPEG images across four high-end and middle-end CPU–GPU combinations including a mobile GPU. We achieve speedups of up to 5.2× over the SIMD version of libjpeg-turbo, and speedups of up to 10.5× over its sequential code. Taking into account the non-parallelizable JPEG entropy-decoding part, our approach achieves up to 97% of the theoretically attainable maximal speedup, with an average of 94%. Copyright © 2015 John Wiley & Sons, Ltd.
机译:随着社交网络的出现和计算摄影技术的改进,每天共享和查看数十亿张JPEG图像。台式机,平板电脑和智能手机构成了用于显示JPEG图像的绝大多数硬件平台。尽管这些平台是异构多核的事实,但尚无能够将系统的CPU和图形处理单元(GPU)结合起来进行JPEG解码的方法。在本文中,我们针对由CPU和通用GPU组成的异构体系结构介绍了一种新颖的JPEG解码方案。我们采用离线分析步骤来确定系统CPU和GPU在JPEG解码方面的性能。对于给定的JPEG图像,我们的性能模型使用:(1)CPU和GPU的性能特征,(2)图像熵,以及(3)图像的宽度和高度,以平衡底层硬件上的JPEG解码工作量。我们的运行时分区和调度方案通过在CPU上调度不可并行化的熵解码任务来利用任务,数据和流水线并行性,而在CPU和GPU上均进行了逆离散余弦变换,颜色转换和上采样。我们已经在libjpeg-turbo库的上下文中实现了所提出的方法,该库是一种工业级JPEG编码和解码引擎。 Libjpeg-turbo针对ARM和x86架构进行手工优化的SIMD例程构成了与所建议方法进行比较的竞争标准。我们评估了我们的方法在包括移动GPU在内的四个高端和中端CPU-GPU组合中总共获得7194JPEG图像。与libjpeg-turbo的SIMD版本相比,我们实现了高达5.2倍的加速,而在其顺序代码上实现了高达10.5倍的加速。考虑到不可并行的JPEG熵解码部分,我们的方法实现了理论上可达到的最大加速的97%,平均为94%。版权所有©2015 John Wiley&Sons,Ltd.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号