首页> 外文OA文献 >STRAIGHT: Realizing a Lightweight Large Instruction Window by using Eventually Consistent Distributed Registers
【2h】

STRAIGHT: Realizing a Lightweight Large Instruction Window by using Eventually Consistent Distributed Registers

机译:直式:通过使用最终一致的分布式寄存器来实现轻量级的大型指令窗口

摘要

As the number of cores as well as the network sizeprograms. For scale-out applications, we assume the manycore processor structure, which consists of a number of STRAIGHT architecture cores (SAC) that are loosely connected each other. Being the first report on this novel processor architecture, in this paper, we discuss the concept behind STRAIGHT, propose basic principles, and estimate the performance and budget expectation. The rest of the paper consists of following sections. Section II revisits studies of new architectures that were designed to improve the ILP/TLP performance of superscalar processors, and discusses the dilemma of both scalability approach and quick worker approach. In section III, we discuss the key idea of STRAIGHT that allows the resolution of this dilemma by introducing a distributed key-value store to the processor architecture. Software and hardware outline models of STRAIGHT are described in section IV. Section V estimates the performance of STRAIGHT by using a cycleaccurate superscalar simulator and possible parameters, as well as hardware budgets. Finally, we summarize the paper in section VI in a processor chip increases, the erformance of each core is more critical for the improvement of the total chip performance.However, to improve the total chip performance, the performance per power or per unit area must be improved, making it difficult to adopt a conventional approach of superscalar extension. In this paper, we explore a new core structure that is suitable for manycore processors. We revisit prior studies of new instructionlevel (ILP) and thread-level parallelism (TLP) architectures and propose our novel STRAIGHT processor architecture. By introducing the scheme of distributed key-value-store to the register file of clustered microarchitectures, STRAIGHT directly executes the operation with large logical registers, which are written only once. By discussing the processor structure, microarchitecture, and code model, we show that STRAIGHT realizes both large instruction window and lightweight rapid execution, while suppressing the hardware and energy cost. Preliminary estimation results are promising, and show that STRAIGHT improves the single thread performance by about 30%, which is the geometric mean of the SPEC CPU 2006 benchmark suite, without significantly increasing the power and area budget. As the number of cores as well as the network sizeprograms. For scale-out applications, we assume the manycore processor structure, which consists of a number of STRAIGHT architecture cores (SAC) that are loosely connected each other. Being the first report on this novel processor architecture, in this paper, we discuss the concept behind STRAIGHT, propose basic principles, and estimate the performance and budget expectation. The rest of the paper consists of following sections. Section II revisits studies of new architectures that were designed to improve the ILP/TLP performance of superscalar processors, and discusses the dilemma of both scalability approach and quick worker approach. In section III, we discuss the key idea of STRAIGHT that allows the resolution of this dilemma by introducing a distributed key-value store to the processor architecture. Software and hardware outline models of STRAIGHT are described in section IV. Section V estimates the performance of STRAIGHT by using a cycleaccurate superscalar simulator and possible parameters, as well as hardware budgets. Finally, we summarize the paper in section VI in a processor chip increases, the erformance of each core is more critical for the improvement of the total chip performance.However, to improve the total chip performance, the performance per power or per unit area must be improved, making it difficult to adopt a conventional approach of superscalar extension. In this paper, we explore a new core structure that is suitable for manycore processors. We revisit prior studies of new instructionlevel (ILP) and thread-level parallelism (TLP) architectures and propose our novel STRAIGHT processor architecture. By introducing the scheme of distributed key-value-store to the register file of clustered microarchitectures, STRAIGHT directly executes the operation with large logical registers, which are written only once. By discussing the processor structure, microarchitecture, and code model, we show that STRAIGHT realizes both large instruction window and lightweight rapid execution, while suppressing the hardware and energy cost. Preliminary estimation results are promising, and show that STRAIGHT improves the single thread performance by about 30%, which is the geometric mean of the SPEC CPU 2006 benchmark suite, without significantly increasing the power and area budget. As the number of cores as well as the network sizeprograms. For scale-out applications, we assume the manycore processor structure, which consists of a number of STRAIGHT architecture cores (SAC) that are loosely connected each other. Being the first report on this novel processor architecture, in this paper, we discuss the concept behind STRAIGHT, propose basic principles, and estimate the performance and budget expectation. The rest of the paper consists of following sections. Section II revisits studies of new architectures that were designed to improve the ILP/TLP performance of superscalar processors, and discusses the dilemma of both scalability approach and quick worker approach. In section III, we discuss the key idea of STRAIGHT that allows the resolution of this dilemma by introducing a distributed key-value store to the processor architecture. Software and hardware outline models of STRAIGHT are described in section IV. Section V estimates the performance of STRAIGHT by using a cycleaccurate superscalar simulator and possible parameters, as well as hardware budgets. Finally, we summarize the paper in section VI in a processor chip increases, the erformance of each core is more critical for the improvement of the total chip performance.However, to improve the total chip performance, the performance per power or per unit area must be improved, making it difficult to adopt a conventional approach of superscalar extension. In this paper, we explore a new core structure that is suitable for manycore processors. We revisit prior studies of new instructionlevel (ILP) and thread-level parallelism (TLP) architectures and propose our novel STRAIGHT processor architecture. By introducing the scheme of distributed key-value-store to the register file of clustered microarchitectures, STRAIGHT directly executes the operation with large logical registers, which are written only once. By discussing the processor structure, microarchitecture, and code model, we show that STRAIGHT realizes both large instruction window and lightweight rapid execution, while suppressing the hardware and energy cost. Preliminary estimation results are promising, and show that STRAIGHT improves the single thread performance by about 30%, which is the geometric mean of the SPEC CPU 2006 benchmark suite, without significantly increasing the power and area budget.
机译:核心数量以及网络规模程序。对于横向扩展应用程序,我们假设采用多核处理器结构,该结构由相互松散连接的多个STRAIGHT体系结构核(SAC)组成。作为有关这种新颖处理器体系结构的第一份报告,我们在本文中讨论了STRAIGHT背后的概念,提出了基本原理,并估计了性能和预算期望。本文的其余部分包括以下部分。第二部分回顾了旨在提高超标量处理器的ILP / TLP性能的新体系结构的研究,并讨论了可伸缩性方法和快速工作器方法的困境。在第三部分中,我们讨论了STRAIGHT的关键思想,该思想通过将分布式键值存储引入处理器体系结构来解决这一难题。第四部分介绍了STRAIGHT的软件和硬件轮廓模型。第五节通过使用周期精确的超标量模拟器和可能的参数以及硬件预算来估计STRAIGHT的性能。最后,我们总结了第六节的论文中处理器芯片的增加,每个内核的性能对于整体芯片性能的提高更为关键,但是,要提高整体芯片性能,每单位功率或每单位面积的性能必须改进,使得难以采用传统的超标量扩展方法。在本文中,我们探索了适用于许多核心处理器的新核心结构。我们回顾了对新指令级(ILP)和线程级并行(TLP)架构的先前研究,并提出了我们新颖的STRAIGHT处理器架构。通过将分布式键值存储方案引入到群集微体系结构的寄存器文件中,STRAIGHT可以直接使用大型逻辑寄存器直接执行该操作,这些逻辑寄存器仅写入一次。通过讨论处理器的结构,微体系结构和代码模型,我们表明STRAIGHT既实现了大指令窗口又实现了轻量级的快速执行,同时又降低了硬件和能源成本。初步的估计结果很有希望,并且表明STRAIGHT将单线程性能提高了大约30%,这是SPEC CPU 2006基准测试套件的几何平均值,而没有显着增加功耗和面积预算。核心数量以及网络规模程序。对于横向扩展应用程序,我们假设采用多核处理器结构,该结构由相互松散连接的多个STRAIGHT体系结构核(SAC)组成。作为有关这种新颖处理器体系结构的第一份报告,我们在本文中讨论了STRAIGHT背后的概念,提出了基本原理,并估计了性能和预算期望。本文的其余部分包括以下部分。第二部分回顾了旨在提高超标量处理器的ILP / TLP性能的新体系结构的研究,并讨论了可伸缩性方法和快速工作器方法的困境。在第三部分中,我们讨论了STRAIGHT的关键思想,该思想通过将分布式键值存储引入处理器体系结构来解决这一难题。第四部分介绍了STRAIGHT的软件和硬件轮廓模型。第五节通过使用周期精确的超标量模拟器和可能的参数以及硬件预算来估计STRAIGHT的性能。最后,我们总结了第六节的论文中处理器芯片的增加,每个内核的性能对于整体芯片性能的提高更为关键,但是,要提高整体芯片性能,每单位功率或每单位面积的性能必须改进,使得难以采用传统的超标量扩展方法。在本文中,我们探索了适用于许多核心处理器的新核心结构。我们回顾了对新指令级(ILP)和线程级并行(TLP)架构的先前研究,并提出了我们新颖的STRAIGHT处理器架构。通过将分布式键值存储方案引入到群集微体系结构的寄存器文件中,STRAIGHT可以直接使用大型逻辑寄存器直接执行该操作,这些逻辑寄存器仅写入一次。通过讨论处理器的结构,微体系结构和代码模型,我们表明STRAIGHT既实现了大指令窗口又实现了轻量级的快速执行,同时又降低了硬件和能源成本。初步的估计结果很有希望,并且表明STRAIGHT将单线程性能提高了大约30%,这是SPEC CPU 2006基准测试套件的几何平均值,而没有显着增加功耗和面积预算。核心数量以及网络规模程序。对于横向扩展应用程序,我们假设采用多核处理器结构,由多个STRAIGHT体系结构核心(SAC)相互松散连接。作为有关这种新颖处理器体系结构的第一份报告,我们在本文中讨论了STRAIGHT背后的概念,提出了基本原理,并估计了性能和预算期望。本文的其余部分包括以下部分。第二部分回顾了旨在提高超标量处理器的ILP / TLP性能的新体系结构的研究,并讨论了可伸缩性方法和快速工作器方法的困境。在第三部分中,我们讨论了STRAIGHT的关键思想,该思想通过将分布式键值存储引入处理器体系结构来解决这一难题。第四部分介绍了STRAIGHT的软件和硬件轮廓模型。第五节通过使用周期精确的超标量模拟器和可能的参数以及硬件预算来估计STRAIGHT的性能。最后,我们总结了第六节的论文中处理器芯片的增加,每个内核的性能对于整体芯片性能的提高更为关键,但是,要提高整体芯片性能,每单位功率或每单位面积的性能必须改进,使得难以采用传统的超标量扩展方法。在本文中,我们探索了适用于许多核心处理器的新核心结构。我们回顾了对新指令级(ILP)和线程级并行(TLP)架构的先前研究,并提出了我们新颖的STRAIGHT处理器架构。通过将分布式键值存储方案引入到群集微体系结构的寄存器文件中,STRAIGHT可以直接使用大型逻辑寄存器直接执行该操作,这些逻辑寄存器仅写入一次。通过讨论处理器的结构,微体系结构和代码模型,我们表明STRAIGHT既实现了大指令窗口又实现了轻量级的快速执行,同时又降低了硬件和能源成本。初步的估计结果很有希望,并且表明STRAIGHT将单线程性能提高了大约30%,这是SPEC CPU 2006基准测试套件的几何平均值,而没有显着增加功耗和面积预算。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号