首页> 外文期刊>Computer architecture news >Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams
【24h】

Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams

机译:原始微处理器的评估:ILP和流的裸线延迟架构

获取原文
获取原文并翻译 | 示例

摘要

This paper evaluates the Raw microprocessor. Raw addresses the challenge of building a general-purpose architecture that performs well on a larger class of stream and embedded computing applications than existing microprocessors, while still running existing ILP-based sequential programs with reasonable performance in the face of increasing wire delays. Raw approaches this challenge by implementing plenty of on-chip resources - including logic, wires, and pins - in a tiled arrangement, and exposing them through a new ISA, so that the software can take advantage of these resources for parallel applications. Raw supports both ILP and streams by routing operands between architecturally-exposed functional units over a point-to-point scalar operand network. This network offers low latency for scalar data transport. Raw manages the effect of wire delays by exposing the interconnect and using software to orchestrate both scalar and stream data transport. We have implemented a prototype Raw microprocessor in IBM's 180 nm, 6-layer copper, CMOS 7SF standard-cell ASIC process. We have also implemented ILP and stream compilers. Our evaluation attempts to determine the extent to which Raw succeeds in meeting its goal of serving as a more versatile, general-purpose processor. Central to achieving this goal is Raw's ability to exploit all forms of parallelism, including ILP, DLP, TLP, and Stream parallelism. Specifically, we evaluate the performance of Raw on a diverse set of codes including traditional sequential programs, streaming applications, server workloads and bit-level embedded computation. Our experimental methodology makes use of a cycle-accurate simulator validated against our real hardware. Compared to a 180 nm Pentium-III, using commodity PC memory system components, Raw performs within a factor of 2x for sequential applications with a very low degree of ILP, about 2x to 9x better for higher levels of ILP, and 10x-100x better when highly parallel applications are coded in a stream language or optimized by hand. The paper also proposes a new versatility metric and uses it to discuss the generality of Raw.
机译:本文评估了Raw微处理器。 Raw解决了构建通用体系结构的挑战,该体系结构在比现有微处理器更大的流和嵌入式计算应用程序类别中表现出色,同时在面对日益增加的布线延迟的同时仍以合理的性能运行现有的基于ILP的顺序程序。 Raw通过以切片方式实现大量的片上资源(包括逻辑,导线和引脚),并通过新的ISA将其公开,从而使软件可以将这些资源用于并行应用程序,从而应对了这一挑战。 Raw通过在点对点标量操作数网络上体系结构公开的功能单元之间路由操作数来支持ILP和流。该网络为标量数据传输提供了低延迟。 Raw通过公开互连并使用软件协调标量和流数据传输来管理线路延迟的影响。我们已经在IBM的180 nm,6层铜,CMOS 7SF标准单元ASIC工艺中实现了Raw微处理器原型。我们还实现了ILP和流编译器。我们的评估试图确定Raw在多大程度上能够成功实现其作为更通用的通用处理器的目标。实现此目标的关键是Raw能够利用所有形式的并行性,包括ILP,DLP,TLP和Stream并行性。具体来说,我们在各种代码集上评估Raw的性能,这些代码集包括传统的顺序程序,流应用程序,服务器工作负载和位级嵌入式计算。我们的实验方法利用了经过真实硬件验证的精确周期的模拟器。与使用商用PC内存系统组件的180 nm Pentium-III相比,Raw对ILP程度非常低的顺序应用的性能提高了2倍左右,对于更高水平的ILP,性能提高了约2到9倍,而性能提高了10到100倍当高度并行的应用程序以流语言进行编码或手动优化时。本文还提出了一种新的通用性指标,并用它来讨论Raw的通用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号