首页> 外文学位 >Application-aware on-chip networks.
【24h】

Application-aware on-chip networks.

机译:应用感知片上网络。

获取原文
获取原文并翻译 | 示例

摘要

Multi-hop packet-based Network-on-Chip (NoC) architectures are widely viewed as the de facto solution for integrating the nodes in many-core architecture for their scalability and well-controlled and highly predictable electrical properties. The Network-on-Chip (NoC) has become an important research focus in recent years because the network plays a critical role in determining the performance and power behavior of a many-core architectures. Most of the innovative solutions proposed for NoC research problems focus on independently optimizing the NoC without exploiting characteristics of applications or software stack. This thesis offers a unique perspective of designing high-performance, scalable and energy efficient NoC's by utilizing application characteristics. In this thesis, I show that we can design much superior on-chip networks if we understand application behavior and customize on-chip networks for them. I propose application-aware approaches for packet scheduling in on-chip networks, application communication locality aware hierarchical topologies for NoCs, and data compression techniques which exploit value locality inherent in application data traffic.;The first contribution of this thesis is to devise application-aware packet scheduling policies for NoCs. The NoCs are likely to become a critical shared resource in future many-core processors. The challenge is to develop policies and mechanisms that enable multiple applications to efficiently and fairly share the network, to improve system performance. A key component of a router that can influence application-level performance and fairness is the arbitration/scheduling unit. Existing polices for arbitration and packet scheduling in NoCs are local and application oblivious. However, we observe that different application characteristics can lead to differential criticality of packets: some packets will be more important to processor execution time than other packets. This novel insight enables us to design packet scheduling polices to provide high performance in on-chip networks.;First, I propose a coordinated application-aware prioritization substrate. The idea is to divide processor execution time into phases, rank applications based on the criticality of network on each applications performance (or based on system-level application priorities) within a phase, and have all routers in the network prioritize packets based on their applications ranks in a coordinated fashion. Our scheme includes techniques that ensure starvation freedom and enable the enforcement of system-level application priorities, resulting in a configurable substrate that enables application-aware prioritization in on-chip networks.;Next, I propose a new architecture Aergia, to exploit slack in packet latency. In this thesis, we define slack as a key measure that characterizes the relative importance of a packet. Specifically, the slack of a packet is the number of cycles the packet can be delayed in the network with no effect on execution time. We propose new router prioritization policies that exploit the available slack of interfering packets in order to accelerate performance-critical packets and thus improve overall system performance. When two packets interfere with each other in a router, the packet with the lower slack value is prioritized. I describe mechanisms to estimate slack, prevent starvation, and combine slack-based prioritization with the application-aware prioritization mechanisms proposed above.;The second contribution of this thesis is application-aware hierarchical topologies. This proposal leverages the insight that applications mapped on a large CMP system will benefit from clustered communication, where data is placed in cache banks closer to the cores accessing it. Thus, we design a hierarchical network topology that takes advantage of such communication locality. The two-tier hierarchical topology consists of local networks that are connected via a global network. The local network is a simple, high-bandwidth, low-power shared bus fabric, and the global network is a low-radix mesh. Since most communication in CMP applications can be limited to the local network, using a fast, low-power bus to handle local communication will improve both network latency and power-efficiency.;The final contribution of this thesis is data compression techniques for on-chip networks. In this context, we examine two different configurations that explore combinations of storage and communication compression: (1) Cache Compression (CC) and (2) Compression in the NIC (NC). We also address techniques to hide the decompression latency by overlapping it with communication latency. We comprehensively characterize and quantify in detail the effect of data compression on NoCs. The attractive benefits seen from our evaluations make a strong case for utilizing compression for optimizing the performance and power envelope of NoC architectures. I also take advantage of compressibility of application data traffic to improve the throughput via novel router microarchitecture, called XShare. The XShare architecture utilizes data value locality and bimodal traffic characteristics of CMP applications to transfer multiple small flits over a single channel.
机译:基于多跳数据包的片上网络(NoC)架构被广泛视为将节点集成到多核架构中的事实解决方案,以实现其可伸缩性以及可控性和高度可预测的电气特性。片上网络(NoC)近年来已成为重要的研究重点,因为网络在确定多核体系结构的性能和电源行为方面起着至关重要的作用。针对NoC研究问题提出的大多数创新解决方案都着重于独立优化NoC,而不利用应用程序或软件堆栈的特性。本文提供了利用应用程序特性设计高性能,可扩展且节能的NoC的独特视角。在本文中,我表明,如果我们了解应用程序行为并为其定制片上网络,那么我们可以设计出更好的片上网络。我提出了用于片上网络中的数据包调度的应用程序感知方法,用于NoC的应用程序通信位置感知的分层拓扑以及利用应用程序数据流量固有的值局部性的数据压缩技术。 NoC的感知数据包调度策略。 NoC可能会成为将来的多核处理器中的关键共享资源。挑战在于制定策略和机制,以使多个应用程序能够有效,公平地共享网络,从而提高系统性能。可能影响应用程序级性能和公平性的路由器的关键组件是仲裁/调度单元。现有的NoC中用于仲裁和数据包调度的策略是本地的,应用程序可以忽略。但是,我们观察到不同的应用程序特征可能导致数据包的关键性差异:某些数据包对处理器执行时间的影响比其他数据包更为重要。这种新颖的见解使我们能够设计数据包调度策略,以在片上网络中提供高性能。首先,我提出了一个协调的,可感知应用程序的优先级基础。这个想法是将处理器执行时间划分为多个阶段,根据网络在每个阶段对每个应用程序性能的重要性(或基于系统级应用程序优先级)对应用程序进行排名,并让网络中的所有路由器根据其应用程序对数据包进行优先级排序以协调的方式排名。我们的方案包括确保饥饿自由并能够强制执行系统级应用程序优先级的技术,从而形成可配置的基板,从而能够在片上网络中实现应用程序感知的优先级划分。接下来,我提出了一种新的架构Aergia,以利用Slack中的空闲时间。数据包延迟。在本文中,我们将松弛定义为表征数据包相对重要性的关键度量。具体来说,数据包的松弛是指该数据包在网络中可以延迟而不影响执行时间的周期数。我们提出了新的路由器优先级划分策略,该策略利用干扰数据包的可用余量来加速对性能至关重要的数据包,从而提高整体系统性能。当两个数据包在路由器中相互干扰时,将优先考虑松弛值较低的数据包。我描述了估计松弛,防止饥饿以及将基于松弛的优先级与上面提出的应用程序感知优先级排序机制结合的机制。本论文的第二个贡献是应用程序感知的分层拓扑。该建议利用了以下洞察力,即大型CMP系统上映射的应用程序将从群集通信中受益,在群集通信中,数据放置在更靠近访问它的内核的高速缓存库中。因此,我们设计了一种利用这种通信位置优势的分层网络拓扑。两层分层拓扑由通过全局网络连接的局域网组成。本地网络是一种简单的,高带宽,低功耗的共享总线结构,而全局网络是一个低基数的网格。由于CMP应用程序中的大多数通信都可能局限于本地网络,因此,使用快速,低功耗的总线来处理本地通信将同时改善网络延迟和功率效率。芯片网络。在这种情况下,我们研究了探索存储和通信压缩组合的两种不同配置:(1)缓存压缩(CC)和(2)NIC(NC)中的压缩。我们还讨论了通过与通信延迟重叠来隐藏解压缩延迟的技术。我们全面表征和详细量化了数据压缩对NoC的影响。从我们的评估中看到的诱人优势为利用压缩来优化NoC架构的性能和功耗范围提供了有力的证明。我还利用称为XShare的新型路由器微体系结构利用应用程序数据流量的可压缩性来提高吞吐量。 XShare体系结构利用CMP应用程序的数据值局部性和双峰流量特性在单个通道上传输多个小碎片。

著录项

  • 作者

    Das, Reetuparna.;

  • 作者单位

    The Pennsylvania State University.;

  • 授予单位 The Pennsylvania State University.;
  • 学科 Engineering Computer.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 152 p.
  • 总页数 152
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号