首页> 外文期刊>Circuits and Systems for Video Technology, IEEE Transactions on >VisoMT: A Collaborative Multithreading Multicore Processor for Multimedia Applications With a Fast Data Switching Mechanism
【24h】

VisoMT: A Collaborative Multithreading Multicore Processor for Multimedia Applications With a Fast Data Switching Mechanism

机译:VisoMT:具有快速数据交换机制的多媒体应用协作多线程多核处理器

获取原文
获取原文并翻译 | 示例
       

摘要

Multithreading and multicore processing are powerful ways to take advantage of parallelism in applications in order to boost a system's performance. However, exploring sufficient parallelism and achieving data locality with low communication overhead are still important research issues in embedded multithreading/multicore design. This paper introduces the design of a fast data switching mechanism between multilevel storage structures in a new multicore architecture. This paper makes several contributions to the development of contemporary sophisticated multimedia applications with advanced standards such as H.264. The first contribution, collaborative-multithreading, tightly unifies reduced instruction set computer and collaborative multithreading digital signal processing (DSP) in order to exploit high parallelism to provide sufficient computing power to applications. Each collaborative thread of our DSP is constructed by a heterogeneous-simultaneously multithreading single instruction, multiple data structure, and four media processing cores, which is connected by a fast switch for providing a fast data exchange mechanism among correlative streams on a thread-level basis. Our second contribution is one-stop streaming processing, which aims to keep data in the system for as long as possible until it is no longer needed, thus making data more efficient to access. Our third contribution is a chunk threading programming model, including a thread management library and threading communication directives for reducing data communication and synchronization overhead. By a combination of coarse-grained and fine-grained threading, programmers can choose various threading levels based on the amount of data exchange in a program. With our proposed techniques and an appropriate programming model, we can reduce processing time by 54.9% in H.26-n4 video encoding (common intermediate format video at 16.574 f/s) with the 1-virtual independent and streaming processing by open collaborative multithreading configuration, compared to the Texas Instruments C62 core that owns 8 function units. We realize our design as a prototype by chip implementation, and fabricate it as a chip based on the Taiwan Semiconductor Manufacturing Company Ltd. 0.13 $mu {rm m}$ process. The die size of the processor core is 16.12 ${rm mm}^{2}$, including 414 k logic transistors and 34.4 kB of on-chip static random access memory. The processor runs at 180 MH0z/1.2-V and consumes 245 mW by postsimulation results.
机译:多线程和多核处理是利用应用程序中的并行性以提高系统性能的强大方法。然而,在嵌入式多线程/多核设计中,探索足够的并行性并以低通信开销实现数据局部化仍然是重要的研究问题。本文介绍了一种新的多核体系结构中多级存储结构之间的快速数据交换机制的设计。本文对采用高级标准(例如H.264)的当代复杂多媒体应用程序的开发做出了一些贡献。第一个贡献是协作多线程,紧密地结合了精简指令集计算机和协作多线程数字信号处理(DSP),以便利用高度并行性为应用程序提供足够的计算能力。我们DSP的每个协作线程都是由异类同时多线程的单指令,多数据结构和四个媒体处理内核构成的,它们由一个快速开关连接,以便在线程级别上在相关流之间提供快速的数据交换机制。 。我们的第二个贡献是一站式流处理,旨在将数据尽可能长时间地保留在系统中,直到不再需要它为止,从而使数据的访问效率更高。我们的第三个贡献是块线程编程模型,包括线程管理库和线程通信指令,用于减少数据通信和同步开销。通过结合使用粗粒度和细粒度的线程,程序员可以根据程序中数据交换的数量来选择各种线程级别。借助我们提出的技术和适当的编程模型,通过开放协作多线程的1虚拟独立和流处理,我们可以将H.26-n4视频编码(16.574 f / s的常见中间格式视频)的处理时间减少54.9%。与拥有8个功能单元的德州仪器(TI)C62内核相比。我们通过芯片实现将我们的设计作为原型实现,并基于台湾半导体制造有限公司的0.13 $ mu {rm m} $工艺来制造它。处理器内核的芯片尺寸为16.12 $ {rm mm} ^ {2} $,包括414 k逻辑晶体管和34.4 kB片上静态随机存取存储器。该处理器的工作频率为180 MH0z / 1.2-V,后仿真结果功耗为245 mW。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号