...
首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Scalability aspects of instruction distribution algorithms for clustered processors
【24h】

Scalability aspects of instruction distribution algorithms for clustered processors

机译:集群处理器的指令分配算法的可伸缩性方面

获取原文
获取原文并翻译 | 示例

摘要

In the evolving submicron technology, making it particularly attractive to use decentralized designs. A common form of decentralization adopted in processors is to partition the execution core into multiple clusters. Each cluster has a small instruction window, and a set of functional units. A number of algorithms have been proposed for distributing instructions among the clusters. The first part of this paper analyzes (qualitatively as well as quantitatively) the effect of various hardware parameters such as the type of cluster interconnect, the fetch size, the cluster issue width, the cluster window size, and the number of clusters on the performance of different instruction distribution algorithms. The study shows that the relative performance of the algorithms is very sensitive to these hardware parameters and that the algorithms that perform relatively better with four or fewer clusters are generally not the best ones for a larger number of clusters. This is important, given that with an imminent increase in the transistor budget, more clusters are expected to be integrated on a single chip. The second part of the paper investigates alternate interconnects that provide scalable performance as the number of clusters is increased. In particular, it investigates two hierarchical interconnects - a single ring of crossbars and multiple rings of crossbars - as well as instruction distribution algorithms to take advantage of these interconnects. Our study shows that these new interconnects with the appropriate distribution techniques achieve an IPC (instructions per cycle) that is 15-20 percent better than the most scalable existing configuration, and is within 2 percent of that achieved by a hypothetical ideal processor having a 1-cycle latency crossbar interconnect. These results confirm the utility and applicability of hierarchical interconnects and hierarchical distribution algorithms in clustered processors.
机译:在不断发展的亚微米技术中,使用分散式设计特别有吸引力。处理器采用的一种分散形式是将执行核心划分为多个集群。每个集群都有一个小的指令窗口和一组功能单元。已经提出了许多用于在集群之间分配指令的算法。本文的第一部分(定性和定量地)分析了各种硬件参数的影响,例如集群互连的类型,访存大小,集群发布宽度,集群窗口大小以及集群数量对性能的影响。不同的指令分配算法。研究表明,算法的相对性能对这些硬件参数非常敏感,并且对于四个或更少的群集,性能相对较好的算法通常不是针对大量群集的最佳算法。鉴于晶体管预算的迫在眉睫的增长,预计将在单个芯片上集成更多的群集,这一点很重要。本文的第二部分研究了随着群集数量的增加而提供可扩展性能的备用互连。特别是,它研究了两个分层的互连-一个交叉开关和多个交叉开关-以及利用这些互连的指令分配算法。我们的研究表明,这些具有适当分布技术的新型互连实现的IPC(每个周期的指令)比最可扩展的现有配置好15-20%,并且在假设理想处理器的IPC为1的情况下不到2%。周期延迟纵横开关互连。这些结果证实了集群处理器中分层互连和分层分布算法的实用性和适用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号