Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams

Michael Bedford Taylor; Walter Lee; Jason Miller; David Wentzlaff; Ian Bratt; Ben Greenwald; Henry Hoffmann; Paul Johnson; Jason Kim; James Psota; Arvind Saraf; Nathan Shnidman; Volker Strumpen; Matt Frank; Saman Amarasinghe; Anant Agarwal

首页> 外文期刊>Computer architecture news >Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams

【24h】

Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams

机译：原始微处理器的评估：ILP和流的裸线延迟架构

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper evaluates the Raw microprocessor. Raw addresses the challenge of building a general-purpose architecture that performs well on a larger class of stream and embedded computing applications than existing microprocessors, while still running existing ILP-based sequential programs with reasonable performance in the face of increasing wire delays. Raw approaches this challenge by implementing plenty of on-chip resources - including logic, wires, and pins - in a tiled arrangement, and exposing them through a new ISA, so that the software can take advantage of these resources for parallel applications. Raw supports both ILP and streams by routing operands between architecturally-exposed functional units over a point-to-point scalar operand network. This network offers low latency for scalar data transport. Raw manages the effect of wire delays by exposing the interconnect and using software to orchestrate both scalar and stream data transport. We have implemented a prototype Raw microprocessor in IBM's 180 nm, 6-layer copper, CMOS 7SF standard-cell ASIC process. We have also implemented ILP and stream compilers. Our evaluation attempts to determine the extent to which Raw succeeds in meeting its goal of serving as a more versatile, general-purpose processor. Central to achieving this goal is Raw's ability to exploit all forms of parallelism, including ILP, DLP, TLP, and Stream parallelism. Specifically, we evaluate the performance of Raw on a diverse set of codes including traditional sequential programs, streaming applications, server workloads and bit-level embedded computation. Our experimental methodology makes use of a cycle-accurate simulator validated against our real hardware. Compared to a 180 nm Pentium-III, using commodity PC memory system components, Raw performs within a factor of 2x for sequential applications with a very low degree of ILP, about 2x to 9x better for higher levels of ILP, and 10x-100x better when highly parallel applications are coded in a stream language or optimized by hand. The paper also proposes a new versatility metric and uses it to discuss the generality of Raw.

机译：本文评估了Raw微处理器。 Raw解决了构建通用体系结构的挑战，该体系结构在比现有微处理器更大的流和嵌入式计算应用程序类别中表现出色，同时在面对日益增加的布线延迟的同时仍以合理的性能运行现有的基于ILP的顺序程序。 Raw通过以切片方式实现大量的片上资源（包括逻辑，导线和引脚），并通过新的ISA将其公开，从而使软件可以将这些资源用于并行应用程序，从而应对了这一挑战。 Raw通过在点对点标量操作数网络上体系结构公开的功能单元之间路由操作数来支持ILP和流。该网络为标量数据传输提供了低延迟。 Raw通过公开互连并使用软件协调标量和流数据传输来管理线路延迟的影响。我们已经在IBM的180 nm，6层铜，CMOS 7SF标准单元ASIC工艺中实现了Raw微处理器原型。我们还实现了ILP和流编译器。我们的评估试图确定Raw在多大程度上能够成功实现其作为更通用的通用处理器的目标。实现此目标的关键是Raw能够利用所有形式的并行性，包括ILP，DLP，TLP和Stream并行性。具体来说，我们在各种代码集上评估Raw的性能，这些代码集包括传统的顺序程序，流应用程序，服务器工作负载和位级嵌入式计算。我们的实验方法利用了经过真实硬件验证的精确周期的模拟器。与使用商用PC内存系统组件的180 nm Pentium-III相比，Raw对ILP程度非常低的顺序应用的性能提高了2倍左右，对于更高水平的ILP，性能提高了约2到9倍，而性能提高了10到100倍当高度并行的应用程序以流语言进行编码或手动优化时。本文还提出了一种新的通用性指标，并用它来讨论Raw的通用性。

著录项

来源
《Computer architecture news》 |2004年第2期|p.2-13|共12页
作者
Michael Bedford Taylor; Walter Lee; Jason Miller; David Wentzlaff; Ian Bratt; Ben Greenwald; Henry Hoffmann; Paul Johnson; Jason Kim; James Psota; Arvind Saraf; Nathan Shnidman; Volker Strumpen; Matt Frank; Saman Amarasinghe; Anant Agarwal;
展开▼
作者单位

CSAIL, Massachusetts Institute of Technology;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Stacking magnetic random access memory atop microprocessors: an architecture-level evaluation [J] . Dong X., Wu X., Xie Y., Computers & Digital Techniques, IET . 2011,第3期

机译：在微处理器顶部堆叠磁性随机存取存储器：体系结构级评估
2. 8-way VLIW embedded microprocessor architecture and performance evaluation [J] . Tomohiro Yamana, Yasuki Nakamura, Hiroshi Okano, 電子情報通信学会技術研究報告. ディジタル信号処理 . 2002,第400期

机译：8路VLIW嵌入式微处理器架构和性能评估
3. 8-way VLIW embedded microprocessor architecture and performance evaluation [J] . Tomohiro Yamana, Yasuki Nakamura, Hiroshi Okano, 電子情報通信学会技術研究報告. ディジタル信号処理 . 2002,第400期

机译：8路VLIW嵌入式微处理器架构和性能评估
4. Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams [C] . Michael Bedford Taylor, Walter Lee, Jason Miller, International Symposium on Computer Architecture . 2004

机译：对原始微处理器的评估：ILP和流的暴露线延迟架构
5. Architecture, Data Model and Real-Time Performance Evaluation of the Streamonas Data Stream Management System. [D] . Michael, Panayiotis Adamos. 2015

机译：Streamonas数据流管理系统的体系结构，数据模型和实时性能评估。
6. Dissociated repetition deficits in aphasia can reflect flexible interactions between left dorsal and ventral streams and gender-dimorphic architecture of the right dorsal stream [O] . Marcelo L. Berthier, Seán Froudist Walsh, Guadalupe Dávila, 2013

机译：失语症的分离性重复缺陷可以反映左背和腹侧流与右背流的性别-双态结构之间的灵活相互作用
7. Evaluation of the raw microprocessor: An exposed-wire-delay architecture for ilp and streams [O] . Michael Bedford Taylor, Walter Lee, Jason Miller, 2004

机译：原始微处理器的评估：用于ilp和流的暴露线延迟架构
8. Polymorphous Computing Architecture (PCA) Kernel Benchmark Measurements on the MIT Raw Microprocessor [R] . Haney, R. J. , Lebak, J. M. , Alexander, M. A. , 2006

机译：mIT原始微处理器上的多层计算架构（pCa）内核基准测量

Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅