...
首页> 外文期刊>International journal of parallel programming >Parallelizing Complex Streaming Applications on Distributed Scratchpad Memory Multicore Architecture
【24h】

Parallelizing Complex Streaming Applications on Distributed Scratchpad Memory Multicore Architecture

机译:分布式Scratchpad内存多核架构上的并行复杂流应用程序并行化

获取原文
获取原文并翻译 | 示例
           

摘要

Multicore processors can provide sufficient computing power and flexibility for complex streaming applications, such as high-definition video processing. For less hardware complexity and power consumption, the distributed scratchpad memory architecture is considered, instead of the cache memory architecture. However, the distributed design poses new challenges to programming. It is difficult to exploit all available capabilities and achieve maximal throughput, due to the combined complexity of inter-processor communication, synchronization, and workload balancing. In this study, we developed an efficient design flow for parallelizing multimedia applications on a distributed scratchpad memory multicore architecture. An application is first partitioned into streaming components and then mapped onto multicore processors. Various hardware-dependent factors and application-specific characteristics are involved in generating efficient task partitions and allocating resources appropriately. To test and verify the proposed design flow, three popular multimedia applications were implemented: a full-HD motion JPEG decoder, an object detector, and a full-HD H.264/AVC decoder. For demonstration purposes, SONY PlayStation~®3 was selected as the target platform. Simulation results show that, on PS3, the full-HD motion JPEG decoder with the proposed design flow can decode about 108.9 frames per second (fps) in the 1080p format. The object detection application can perform real-time object detection at 2.84 fps at 1280 × 960 resolution, 11.75 fps at 640 × 480 resolution, and 62.52 fps at 320 × 240 resolution. The full-HD H.264/AVC decoder applications can achieve nearly 50 fps.
机译:多核处理器可以为复杂的流应用程序(例如高清视频处理)提供足够的计算能力和灵活性。为了降低硬件复杂性和降低功耗,考虑了分布式暂存器存储器架构,而不是高速缓存存储器架构。但是,分布式设计给编程带来了新的挑战。由于处理器间通信,同步和工作负载平衡的综合复杂性,难以利用所有可用功能并实现最大吞吐量。在这项研究中,我们开发了一种高效的设计流程,用于在分布式暂存器内存多核体系结构上并行化多媒体应用程序。首先将应用程序划分为流组件,然后将其映射到多核处理器。在生成有效的任务分区和适当分配资源时会涉及各种与硬件有关的因素和特定于应用程序的特征。为了测试和验证提议的设计流程,实现了三种流行的多媒体应用程序:全高清运动JPEG解码器,对象检测器和全高清H.264 / AVC解码器。出于演示目的,选择了Sony PlayStation〜®3作为目标平台。仿真结果表明,在PS3上,具有建议设计流程的全高清运动JPEG解码器可以以1080p格式解码约108.9帧/秒(fps)。对象检测应用程序可以在1280×960分辨率下以2.84 fps,在640×480分辨率下为11.75 fps和在320×240分辨率下为62.52 fps进行实时对象检测。全高清H.264 / AVC解码器应用程序可以达到近50 fps。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号