In-Network Coherence Techniques for Scalable Shared Memory.

机译：可伸缩共享内存的网络内一致性技术。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Moore's law has driven the continuous miniaturization of transistor size over the years. This has led to breakthroughs in the domain of computers (desktops, servers, supercomputers, etc.), as well as in the domain of mobile devices. Computers of today have abundant on-chip resources that can be exploited to perform increasingly complex tasks. Similarly, the computation capacity of mobile devices has grown manifold over the years. Today's high-end mobile devices have greater computational capability than the desktops available a few years ago. However, programming general-purpose computers as well as mobile devices is not easy. This dissertation seeks to address the parallel programming challenge existing in the domain of multi-core systems, as well as the distributed programming problem associated with mobile ad-hoc networks. Specifically, we look into providing scalable shared memory for both parallel and distributed mobile computing.;With technology scaling leading to an abundance of on-chip resources and uniprocessor designs providing diminishing returns, the computing industry has moved beyond single-core microprocessors and embraced the many-core wave. It is widely believed that shared memory programming is a key to addressing the parallel programming challenge. To implement shared memory, scalable cache coherence protocol implementations are necessary to allow fast sharing of data among various cores and drive the many-core revolution forward. In this dissertation, we take an in-network approach to address the many-core cache coherence problem. We tackle the ordering and bandwidth overhead problem of snoopy protocols in the interconnect and demonstrate how snoopy protocols, originally proposed for small-scale multiprocessors connected via ordered interconnects, present a viable cache coherence solution for many cores on a single chip connected via unordered interconnects.;To address the cache coherence ordering problem, we present In-network Snoop Ordering (INSO), in which coherence requests from a snoop-based protocol are inserted into the interconnect fabric and the network orders the requests in a distributed manner, creating a global ordering among requests. To address the broadcast bandwidth overhead of snoopy protocols, we propose embedding small in-network coherence filters (INCFs) inside on-chip routers that dynamically track sharing patterns among various cores and filter away redundant snoop requests to save interconnect bandwidth and power. This dissertation also addresses the problem of evaluating future many-core architecture proposals. We have implemented GARNET, a detailed and cycle-level on-chip interconnect model, inside the GEMS full-system simulator. GEMS, along with GARNET, provides a full-system performance and power evaluation framework.;Location-based services accessed by mobile devices are increasingly pervasive. Much of the data processed by such services are distributed near the locations where the processed results are needed. If the mobile devices could conduct some of this computation locally, they could gain three major advantages: (1) ease the bandwidth pressure on already overloaded access networks, (2) lead to quicker response times, and (3) improve battery life of mobile devices. To realize the above vision, there needs to be a way to easily program a collection of mobile devices as a whole, since a lot of programming complexities arise due to the mobile nature of such platforms.;In this dissertation, we present a programming abstraction layer, called the Consistent Shared Memory Layer (CSMlayer), that provides consistent shared memory objects to mobile devices. We argue that consistent shared memory is a key requirement in enabling an easy to program, distributed mobile platform. The key to realizing this vision is porting ideas of consistency and coherence from the multi-core domain to the wireless domain. We illustrate how applications that require data to remain reliable and be consistently accessed are now enabled over mobile nodes. This was impossible earlier.

机译：多年来，摩尔定律推动了晶体管尺寸的不断小型化。这导致了计算机领域（台式机，服务器，超级计算机等）以及移动设备领域的突破。当今的计算机拥有丰富的片上资源，可用于执行日益复杂的任务。同样，这些年来，移动设备的计算能力得到了极大的提高。与几年前的台式机相比，如今的高端移动设备具有更大的计算能力。但是，对通用计算机以及移动设备进行编程并不容易。本文旨在解决多核系统领域存在的并行编程难题，以及与移动自组织网络相关的分布式编程问题。具体来说，我们正在研究为并行和分布式移动计算提供可扩展的共享内存。随着技术的扩展导致大量的片上资源和单处理器设计提供了越来越少的回报，计算机行业已经超越了单核微处理器，并开始接受多核浪潮。人们普遍认为，共享内存编程是解决并行编程挑战的关键。为了实现共享内存，可伸缩的缓存一致性协议实现是必需的，以允许在各个内核之间快速共享数据并推动多核革命。本文采用一种网络内的方法来解决多核缓存的一致性问题。我们解决了互连中窥探协议的排序和带宽开销问题，并演示了最初针对通过有序互连连接的小型多处理器提出的窥探协议如何为通过无序互连连接的单个芯片上的许多内核提供可行的缓存一致性解决方案。；为了解决缓存一致性排序问题，我们提出了网络内监听排序（INSO），其中将来自基于监听的协议的一致性请求插入到互连结构中，网络以分布式方式对请求进行排序，从而创建了全局在请求中排序。为了解决侦听协议的广播带宽开销，我们建议在片上路由器中嵌入小型网络内一致性过滤器（INCF），该芯片可以动态跟踪各个内核之间的共享模式并过滤掉多余的侦听请求，以节省互连带宽和功耗。本文还解决了评估未来多核架构提案的问题。我们已经在GEMS完整系统模拟器中实现了GARNET，这是一种详细的周期级片上互连模型。 GEMS与GARNET一起提供了完整的系统性能和功率评估框架。移动设备访问的基于位置的服务越来越普遍。由此类服务处理的许多数据分布在需要处理结果的位置附近。如果移动设备可以在本地进行某些这种计算，则它们可以获得三个主要优点：（1）缓解已经超载的接入网络上的带宽压力；（2）缩短响应时间；（3）延长移动设备的电池寿命设备。为了实现上述愿景，由于这种平台的移动性会导致很多编程复杂性，因此有必要整体上对一组移动设备进行轻松编程。称为一致性共享内存层（CSMlayer）的层，该层为移动设备提供一致的共享内存对象。我们认为一致的共享内存是启用易于编程的分布式移动平台的关键要求。实现这一愿景的关键是将一致性和一致性的思想从多核域移植到无线域。我们说明了如何通过移动节点启用需要数据保持可靠性并被一致访问的应用程序。以前这是不可能的。

著录项

作者
Agarwal, Niket.;
展开▼
作者单位

Princeton University.;

展开▼
授予单位 Princeton University.;
学科 Engineering Computer.
学位 Ph.D.
年度 2011
页码 216 p.
总页数 216
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. SCORPIO: A 36-Core Research Chip Demonstrating Snoopy Coherence on a Scalable Mesh NoC with In-Network Ordering [J] . Bhavya K. Daya, Chia-Hsin Owen Chen, Suvinay Subramanian, Computer architecture news . 2014 ,第3期

机译：SCORPIO：一种36核研究芯片，可在带网络内订购的可扩展Mesh NoC上展示史努比一致性
2. Coherence controller architectures for scalable shared-memory multiprocessors [J] . Michael M.M., Nanda A.K. IEEE Transactions on Computers . 1999 ,第2期

机译：用于可伸缩共享内存多处理器的一致性控制器体系结构
3. A quantitative analysis of the performance and scalability of distributed shared memory cache coherence protocols [J] . Heinrich M., Soundararajan V. IEEE Transactions on Computers . 1999 ,第2期

机译：分布式共享内存缓存一致性协议的性能和可伸缩性的定量分析
4. Virtual tree coherence: Leveraging regions and in-network multicast trees for scalable cache coherence [C] . Enright Jerger Natalie D., Peh Li-Shiuan, Lipasti Mikko H. IEEE/ACM International Symposium on Microarchitecture . 2008

机译：虚拟树一致性：利用地区和网络中的组播树，以获得可扩展的缓存一致性
5. Automatic scaling of OpenMP applications beyond shared memory. [D] . Kwon, Okwan. 2013

机译：OpenMP应用程序的自动扩展超出共享内存。
6. Towards a taxonomy of behavior change techniques for promoting shared decision making [O] . Titilayo Tatiana Agbadjé, Hélène Elidor, Milena Sia Perin, 2020

机译：朝着促进共享决策的行为改变技术的分类
7. Virtual Tree Coherence: Leveraging Regions and In-Network Multicast Trees for Scalable Cache Coherence [O] . Natalie D. Enright Jerger 2009

机译：虚拟树一致性：利用区域和网内多播树实现可扩展的高速缓存一致性

In-Network Coherence Techniques for Scalable Shared Memory.

摘要

著录项

相似文献

相关主题

期刊订阅