【24h】

Initial results on the performance and cost of vector microprocessors

机译:向量微处理器的性能和成本的初步结果

获取原文

摘要

Increasingly wider superscalar processors are experiencing diminishing performance returns while requiring larger portions of die area dedicated to control rather than datapath. As an alternative to using these processors to exploit parallelism effectively, we are investigating the viability of using single-chip vector microprocessors. This paper presents some initial results of our investigation where we compare the performance and cost of vector microprocessors to that of aggressive, out-of-order super- scalar microprocessors.On the performance side, we show that vector processors are able to execute a highly parallel, integer-based application 1.5- 7.3 times faster than superscalar processors can by exploiting parallelism more effectively. This ability stems from the use of vector instructions. Vector instructions exploit parallelism across loop iterations by implicitly re-scheduling operations and temporally localizing the parallelism. Vector instructions also reduce instruction bandwidth by more than an order of magnitude because they express an abundance of parallelism in a compact encoding.On the cost side we show that, to achieve these performance gains, highly parallel, integer-based vector microprocessors are no more costly to implement than existing in-order and out-of- order superscalar microprocessors. One reason for this is that the organization of a vector register file provides tremendous bandwidth without incurring a large area penalty. A second reason is that the control logic for issuing vector instructions is relatively simple.Both the performance gains and cost savings are possible because vector processors rely on a vectorizing compiler, rather than hardware, to detect parallelism and to express it in a compact form to the hardware. These initial results suggest that transferring this functionality to the compiler offers a tremendous performance/cost benefit.
机译:越来越宽的超标量处理器正经历着越来越低的性能回报,同时需要更多的裸片区域专用于控制而不是数据路径。作为使用这些处理器有效利用并行性的替代方法,我们正在研究使用单芯片矢量微处理器的可行性。本文介绍了我们研究的一些初步结果,我们将矢量微处理器的性能和成本与具有攻击性的无序超标量微处理器进行了比较。在性能方面,我们证明了矢量处理器能够执行高度高效的处理。通过更有效地利用并行性,基于整数的并行应用程序比超标量处理器快1.5- 7.3倍。这种能力源于矢量指令的使用。向量指令通过隐式地重新调度操作并在时间上局部化并行性,从而在循环迭代中利用并行性。矢量指令还将指令带宽减少了一个数量级,因为它们在紧凑的编码中表示出大量的并行性。在成本方面,我们表明,要获得这些性能提升,高度并行的基于整数的矢量微处理器将不再存在。与现有的有序和无序超标量微处理器相比,实施起来成本高昂。这样做的一个原因是向量寄存器文件的组织提供了巨大的带宽而又不会产生大的面积损失。第二个原因是发出矢量指令的控制逻辑相对简单,因为矢量处理器依靠矢量化编译器而不是硬件来检测并行度并以紧凑的形式表示并行度,从而可以提高性能并节省成本。硬件。这些初步结果表明,将此功能转移到编译器可提供巨大的性能/成本优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号