首页> 外文会议>International Workshop on Advanced Parallel Processing Technologies(APPT 2005); 20051027-28; Hong Kong(CN) >Static Partitioning vs Dynamic Sharing of Resources in Simultaneous MultiThreading Microarchitectures
【24h】

Static Partitioning vs Dynamic Sharing of Resources in Simultaneous MultiThreading Microarchitectures

机译:同时多线程微体系结构中资源的静态分区与动态共享

获取原文
获取原文并翻译 | 示例

摘要

Simultaneous MultiThreading (SMT) achieves better system resource utilization and higher performance because it exploits Thread-Level Parallelism (TLP) in addition to "conventional" Instruction-Level Parallelism (ILP). Theoretically, system resources in every pipeline stage of an SMT microarchitecture can be dynamically shared. However, in commercial applications, all the major queues are statically partitioned. From an implementation point of view, static partitioning of resources is easier to implement and has a lower hardware overhead and power consumption. In this paper, we strive to quantitatively determine the tradeoff between static partitioning and dynamic sharing. We find that static partitioning of either the instruction fetch queue (IFQ) or the reorder buffer (ROB) is not sufficient if implemented alone (3% and 9% performance decrease respectively in the worst case comparing with dynamic sharing), while statically partitioning both the IFQ and the ROB could achieve an average performance gain of 9% at least, and even reach 148% when running with floating-point benchmarks, when compared with dynamic sharing. We varied the number of functional units in our efforts to isolate the reason for this performance improvement. We found that static partitioning both queues outperformed all the other partitioning mechanisms under the same system configuration. This demonstrates that the performance gain has been achieved by moving from dynamic sharing to static partitioning of the system resources.
机译:同步多线程(SMT)实现了更好的系统资源利用率和更高的性能,因为它除了“常规”指令级并行(ILP)之外还利用了线程级并行(TLP)。从理论上讲,可以动态共享SMT微体系结构的每个管道阶段中的系统资源。但是,在商业应用中,所有主要队列都是静态分区的。从实现的角度来看,资源的静态分区更易于实现,并且硬件开销和功耗更低。在本文中,我们努力定量确定静态分区和动态共享之间的权衡。我们发现,如果单独实施指令提取队列(IFQ)或重排序缓冲区(ROB)的静态分区是不够的(与动态共享相比,在最坏的情况下性能分别降低3%和9%),而对两者进行静态分区与动态共享相比,使用浮点基准测试时,IFQ和ROB的平均性能至少可以提高9%,甚至可以达到148%。我们努力改变功能单元的数量,以找出导致性能提高的原因。我们发现,在同一系统配置下,两个队列的静态分区均优于其他所有分区机制。这表明通过将系统资源从动态共享转移到静态分区已实现了性能提升。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号