首页> 外文期刊>Computer architecture news >A Memory System Design Framework: Creating Smart Memories
【24h】

A Memory System Design Framework: Creating Smart Memories

机译:内存系统设计框架:创建智能内存

获取原文
获取原文并翻译 | 示例

摘要

As CPU cores become building blocks, we see a great expansion in the types of on-chip memory systems proposed for CMPs. Unfortunately, designing the cache and protocol controllers to support these memory systems is complex, and their concurrency and latency characteristics significantly affect the performance of any CMP. To address this problem, this paper presents a microarchitecture framework for cache and protocol controllers, which can aid in generating the RTL for new memory systems. The framework consists of three pipelined engines-request-tracking, state-manipulation, and data movement—which are programmed to implement a higher-level memory model. This approach simplifies the design and verification of CMP systems by decomposing the memory model into sequences of state and data manipulations. Moreover, implementing the framework itself produces a polymorphic memory system.rnTo validate the approach, we implemented a scalable, flexible CMP in silicon. The memory system was then programmed to support three disparate memory models-cache coherent shared memory, streams and transactional memory. Measured overheads of this approach seem promising. Our system generates controllers with performance overheads of less than 20% compared to an ideal controller with zero internal latency. Even the overhead of directly implementing a fully programmable controller was modest. While it did double the controller's area, the amortized effective area in the system grew by roughly 7%.
机译:随着CPU内核成为构建块,我们看到为CMP建议的片上存储系统的类型有了很大的扩展。不幸的是,设计支持这些存储系统的缓存和协议控制器很复杂,并且它们的并发性和等待时间特性会严重影响任何CMP的性能。为了解决这个问题,本文提出了一种用于缓存和协议控制器的微体系结构框架,该框架可以帮助为新的内存系统生成RTL。该框架由三个流水线引擎组成,分别是请求跟踪,状态操纵和数据移动,这些引擎经过编程以实现更高级别的内存模型。通过将内存模型分解为状态和数据操作序列,此方法简化了CMP系统的设计和验证。此外,实现框架本身会产生多态存储系统。为了验证该方法,我们在硅片中实现了可扩展的灵活CMP。然后对内存系统进行编程,以支持三种不同的内存模型-缓存一致性共享内存,流和事务性内存。这种方法的可衡量的开销似乎很有希望。与理想的内部延迟为零的控制器相比,我们的系统生成的控制器的性能开销不到20%。甚至直接实施完全可编程控制器的开销也很小。尽管它使控制器的面积增加了一倍,但系统中的摊销有效面积却增长了约7%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号