首页> 外文OA文献 >Evaluation of parallel H.264 decoding strategies for the Cell Broadband Engine

【2h】

Evaluation of parallel H.264 decoding strategies for the Cell Broadband Engine

机译：小区宽带引擎并行H.264解码策略的评估

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

How to develop efficient and scalable parallel applications is the key challenge for emerging many-core architectures. We investigate this question by implementing and comparing two parallel H.264 decoders on the Cell architecture. It is expected that future many-cores will use a Cell-like local store memory hierarchy, rather than a non-scalable shared memory. The two implemented parallel algorithms, the Task Pool (TP) and the novel Ring-Line (RL) approach, both exploit macroblock-level parallelism. The TP implementation follows the master-slave paradigm and is very dynamic so that in theory perfect load balancing can be achieved. The RL approach is distributed and more predictable in the sense that the mapping of macroblocks to processing elements is fixed. This allows to better exploit data locality, to overlap communication with computation, and to reduce communication and synchronization overhead. While TP is more scalable in theory, the actual scalability favors RL. Using 16 SPEs, RL obtains a scalability of 12x, while TP achieves only 10.3x. More importantly, the absolute performance of RL is much higher. Using 16 SPEs, RL achieves a throughput of 139.6 frames per second (fps) while TP achieves only 76.6 fps. A large part of the additional performance advantage is due to hiding the memory latency. From the results we conclude that in order to fully leverage the performance of future many-cores, a centralized master should be avoided and the mapping of tasks to cores should be predictable in order to be able to hide the memory latency.

机译：如何开发高效且可扩展的并行应用程序是新兴的多核体系结构的关键挑战。我们通过在Cell体系结构上实现和比较两个并行的H.264解码器来研究此问题。预计未来的多核将使用类似于单元的本地存储内存层次结构，而不是不可伸缩的共享内存。任务池（TP）和新颖的环行（RL）方法这两种已实现的并行算法都利用了宏块级并行性。 TP实现遵循主从范式，并且非常动态，因此在理论上可以实现完美的负载平衡。从宏块到处理元素的映射是固定的意义上讲，RL方法是分布式的并且更可预测。这允许更好地利用数据局部性，使通信与计算重叠，并减少通信和同步开销。虽然TP在理论上更具可伸缩性，但实际可伸缩性有利于RL。 RL使用16个SPE，可获得12倍的可扩展性，而TP仅达到10.3倍。更重要的是，RL的绝对性能要高得多。使用16个SPE，RL实现每秒139.6帧（fps）的吞吐量，而TP仅达到76.6 fps。其他性能优势的很大一部分是由于隐藏了内存延迟。从结果可以得出结论，为了充分利用未来多核的性能，应避免使用集中式主服务器，并且应可预测任务到核的映射，以便能够隐藏内存延迟。

著录项

作者
Chi Chi Ching; Juurlink Ben; Meenderinck Cor;
展开▼
作者单位

展开▼
年度 2010
总页数
原文格式 PDF
正文语种 en
中图分类

相似文献

外文文献
中文文献
专利

1. PERFORMANCE EVALUATION OF MACROBLOCK-LEVEL PARALLELIZATION OF H.264 DECODING ON A CC-NUMA MULTIPROCESSOR ARCHITECTURE [J] . MAURICIO ALVAREZ, ALEX RAMIREZ, MATEO VALERO, Avances en Sistemas e Informática . 2010,第1期

机译：CC-NUMA多处理器架构上H.264宏块级并行化性能评估
2. Evaluation of data-parallel H.264 decoding approaches for strongly resource-restricted architectures [J] . Florian H. Seitner, Michael Bleyer, Margrit Gelautz, Multimedia Tools and Applications . 2011,第2期

机译：对资源受限的体系结构的数据并行H.264解码方法的评估
3. Parallel Video Processing Performance Evaluation on the IBM Cell Broadband Engine Processor. [J] . Fadi N. Sibai, Hashir Karim Kidwai, Tamer Rabie International Journal of Computer Science & Applications . 2009,第1期

机译：IBM Cell宽带引擎处理器上的并行视频处理性能评估。
4. Evaluation of Parallel H.264 Decoding Strategies for the Cell Broadband Engine [C] . Chi Ching Chi, Ben Juurlink, Cor Meenderinck 24th ACM international conference on supercomputing 2010 . 2010

机译：单元宽带引擎并行H.264解码策略的评估
5. Improving the Performance of Neural Networks through Parallel Processing in the Cell Broadband Engine. [D] . Boiko, Yuri. 2010

机译：通过单元宽带引擎中的并行处理提高神经网络的性能。
6. Optimization and critical evaluation of decellularization strategies to develop renal extracellular matrix scaffolds as biological templates for organ engineering and transplantation [O] . Mireia Caralt, Joseph S. Uzarski, Stanca Iacob, -1

机译：优化和关键评估脱细胞策略以开发肾细胞外基质支架作为器官工程和移植的生物模板
7. A Scalable Parallel H.264 Decoder on the Cell Broadband Engine Architecture [O] . 2015

机译：基于小区宽带引擎架构的可扩展并行H.264解码器
8. Cell-NPE (Numerical Performance Evaluation): Programming the IBM Cell Broadband Engine -- A General Parallelization Strategy [R] . Hauser, J. H., Cambier, J., Surampudi, S., 2008

机译：Cell-NpE（数值性能评估）：IBm Cell Broadband Engine的编程 - 一般并行化策略

Evaluation of parallel H.264 decoding strategies for the Cell Broadband Engine

摘要

著录项

相似文献

相关主题

期刊订阅