A Performance Study of Out-of-order Vector Architectures and Short Registers

机译：无序向量架构和短寄存器的性能研究

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents a study of the impact of reducing the vector register length in an out-of-order vector architecture. In traditional in-order vector architectures, long vector registers have typically been the norm. We start presenting data that shows that, even for highly vectorizable codes, only a small fraction of all elements of a long vector register are actually used. We also show that reducing the register size in a traditional vector architecture in an attempt to reduce hardware cost and maximize register utilization results in a severe performance degradation.rnHowever, when we combine out-of-order execution and short registers, our simulations show that the performance penalty can be made very small. Moreover, this new architecture tolerates memory latency much better than a traditional machine and uses the storage space in each register more efficiently. We present results for a selection of the Specfp 92 and Perfect Club codes that show speedups of the out-of-order machine over the traditional machine anywhere in the range 1.1 to 1.6. Halving the register size (from 16Kb in the out-of-order machine down to 8Kb) yields speedups around 1.3 and as high as 1.6. Even when reducing the register length to 1/4 the original size, speedups are still around 1.2 and when going to a register length of 16 elements (1/8 the original) most programs perform very close to the traditional in-order vector machine.

机译：本文提出了在无序向量架构中减少向量寄存器长度的影响的研究。在传统的有序向量体系结构中，通常使用长向量寄存器。我们开始提供的数据表明，即使对于高度可矢量化的代码，长向量寄存器中的所有元素中实际上也只有一小部分被使用。我们还表明，在传统的矢量架构中减小寄存器大小以降低硬件成本并最大化寄存器利用率会导致严重的性能下降。然而，当我们将无序执行和短寄存器结合在一起时，我们的仿真表明：性能损失可以做得很小。而且，这种新架构比传统机器更能容忍内存延迟，并且可以更有效地利用每个寄存器中的存储空间。我们提供了一些Specfp 92和Perfect Club代码的结果，这些代码显示了故障机器相对于传统机器在1.1到1.6范围内任何地方的加速情况。将寄存器大小减半（从乱序机器中的16Kb减小到8Kb）可以使速度提高1.3倍左右，最高达到1.6倍。即使将寄存器长度减小到原始大小的1/4，加速仍然保持在1.2左右，而当寄存器长度达到16个元素（原始大小的1/8）时，大多数程序执行起来都非常接近传统的有序向量机。

著录项

来源
《1998 international conference on supercomputing》|1998年|37-44|共8页
会议地点 Melbourne(AU);Melbourne(AU)
作者
Luis Villa; Roger Espasa; Mateo Valero;
展开▼
作者单位

Departament d'Arquitectura de Computadors, Universitat Politecnica de Catalunya-Barcelona, Spain;

Departament d'Arquitectura de Computadors, Universitat Politecnica de Catalunya-Barcelona, Spain;

Departament d'Arquitectura de Computadors, Universitat Politecnica de Catalunya-Barcelona, Spain;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算机的应用;
关键词
入库时间 2022-08-26 14:03:09

相似文献

外文文献
中文文献
专利

1. O3BNN-R: An Out-of-Order Architecture for High-Performance and Regularized BNN Inference [J] . Geng Tong, Li Ang, Wang Tianqi, IEEE Transactions on Parallel and Distributed Systems . 2021,第1期

机译：O3BNN-R：用于高性能和正则化BNN推理的订单超出架构
2. Architecture, performance modeling and VLSI implementation methodologies for ASIC vector processors: A case study in telephony workloads [J] . Vassilios A. Chouliaras, Konstantia Koutsomyti, Simon Parr, Microprocessors and microsystems . 2013,第8ptaD期

机译：ASIC矢量处理器的体系结构，性能建模和VLSI实现方法：电话工作负载中的案例研究
3. A task-based parallelism and vectorized approach to 3D Method of Characteristics (MOC) reactor simulation for high performance computing architectures [J] . Tramm John R., Gunow Geoffrey, He Tim, Computer physics communications . 2016,第Null期

机译：基于任务的并行度和矢量化方法，用于高性能计算体系结构的3D特征方法（MOC）反应堆仿真
4. A Performance Study of Out-of-order Vector Architectures and Short Registers [C] . International conference on supercomputing . 1998

机译：对无序矢量架构和短寄存器的绩效研究
5. Vectorization and Register Reuse in High Performance Computing. [D] . Stock, Kevin Alan. 2014

机译：高性能计算中的向量化和寄存器重用。
6. An explorative study from the Norwegian Quality Register Gastronet comparing self-estimated versus registered quality in colonoscopy performance [O] . Volker Moritz, Oyvind Holme, Marissa Leblanc, 2016

机译：来自挪威质量注册机构Gastronet的一项探索性研究将自我评估质量与注册质量在结肠镜检查性能方面进行了比较
7. Minimum Register Instruction Sequencing to Reduce Register Spills in Out-of-Order Issue Superscalar Architectures [O] . Govindarajan R, Yang Hongbo, Amaral Jose Nelson, 2003

机译：最小寄存器指令排序，以减少乱序发行超标量体系结构中的寄存器溢出
8. Vector Performance of Register-to-Register Vector Computers [R] . Bucher, I. Y. 1988

机译：寄存器到寄存器矢量计算机的矢量性能

A Performance Study of Out-of-order Vector Architectures and Short Registers

摘要

著录项

相似文献

相关主题

期刊订阅