首页> 外文期刊>Journal of VLSI signal processing systems for signal, image, and video technology >A Distributed, Simultaneously Multi-threaded (smt) Processor With Clustered Scheduling Windows For Scalable Dsp Performance
【24h】

A Distributed, Simultaneously Multi-threaded (smt) Processor With Clustered Scheduling Windows For Scalable Dsp Performance

机译:具有群集调度窗口的分布式,同时多线程(smt)处理器,可扩展Dsp性能

获取原文
获取原文并翻译 | 示例

摘要

A scalable, distributed, processor architecture is presented that emphasizes on high performance computing for digital signal processing applications by combining high frequency design techniques with a very high degree of parallel processing on a chip. The architecture is based on a superscalar processor model with a modified Tomasulo scheme that was extended to eliminate all central control structures for the data flow and to support simultaneous instruction issue from multiple independent threads [simultaneously multi-threaded (SMT)]. Consequent application of fine clustering reduces the cycle-time for wire-sensitive building blocks of the processor like the register file and the scheduling window and leads to a distributed architecture model, where independent thread processing units, arithmetic logic units, registers files and memories are distributed across the chip and communicate with each other by special network. A special communication protocol replaces broadcasting and associative compare of destination tags in a centralised instruction scheduler with explicit operand transfer instructions, thus decentralizing the control of the data flow to the greatest extent. As a result, the processor cycle time does neither depend on the issue bandwidth of a single thread nor on the execution bandwidth of the SMT processor. This makes the performance of the architecture scalable with both the number of function and the number of thread units without having any impact on the processors cycle-time. Performance and scalability of the proposed microarchitecture is demonstrated with critical signal processing kernels from the MPEG-4 video coding standard on a cycle-true simulator.
机译:提出了一种可扩展的分布式处理器架构,该架构通过将高频设计技术与芯片上的高度并行处理相结合,着重于数字信号处理应用的高性能计算。该架构基于具有改进的Tomasulo方案的超标量处理器模型,该模型经过扩展以消除数据流的所有中央控制结构,并支持来自多个独立线程(同时多线程(SMT))的同时指令发布。随之而来的是,精细集群的应用减少了诸如寄存器文件和调度窗口之类的处理器对线敏感的构建块的周期时间,并导致了分布式体系结构模型,其中独立的线程处理单元,算术逻辑单元,寄存器文件和存储器是独立的。分布在芯片上,并通过特殊网络相互通信。特殊的通信协议用显式的操作数传输指令代替了集中式指令调度器中目标标记的广播和关联比较,从而最大程度地分散了数据流的控制。结果,处理器周期时间既不依赖于单个线程的发布带宽也不依赖于SMT处理器的执行带宽。这使体系结构的性能可随功能数量和线程单元数量的增加而扩展,而不会影响处理器的周期时间。在周期真实模拟器上,使用来自MPEG-4视频编码标准的关键信号处理内核演示了所提出的微体系结构的性能和可伸缩性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号