首页> 外文学位 >Software Assists to On-chip Memory Hierarchy of Manycore Embedded Systems
【24h】

Software Assists to On-chip Memory Hierarchy of Manycore Embedded Systems

机译:该软件有助于Manycore嵌入式系统的片上存储器层次结构

获取原文
获取原文并翻译 | 示例

摘要

The growing computing demands of emerging application domains such as Recognition/Mining/Synthesis (RMS), visual computing, wearable devices and the Internet of Things (IoT) has driven the move towards manycore architectures to better manage tradeoffs among performance, energy efficiency, and reliability.;The memory hierarchy of manycore architectures has a major impact on their overall performance, energy efficiency and reliability. We identify three major problems that make traditional memory hierarchies unattractive for manycore architectures and their data-intensive workloads: (1) they are power hungry and not a good fit for manycores in face of dark silicon, (2) they are not adaptable to the workload's requirements and memory behavior, and (3) they are not scalable due to coherence overheads.;This thesis argues that many of these inefficiencies are the result of software-agnostic hardware-managed memory hierarchies. Application semantics and behavior captured in software can be exploited to more efficiently manage the memory hierarchy. This thesis exploits some of this information and proposes a number of techniques to mitigate the aforementioned inefficiencies in two broad contexts: (1) explicit management of hybrid cache-SPM memory hierarchies, and (2) exploiting approximate computing for energy efficiency.;We first present the required hardware and software support for a software-assisted memory hierarchy that is composed of distributed memories which can be partitioned between caches and software-programmable memories (SPMs) at runtime. This memory hierarchy supports local and remote allocations and data movements between SPM and cache and also between two physical SPMs. The distributed SPM space is shared between a mix of threads where each thread explicitly requests SPM space throughout its execution. The runtime component of this hierarchy shares the entire distributed SPM space between contending threads based on an allocation policy. Unlike traditional memory hierarchies, we incorporate no coherence logic in this hierarchy. The program explicitly allocates the shared data on the distributed SPM space. For all threads of that program, the accesses to shared data are forwarded to the same physical copy.;Next, we augment caches and SPMs in this hierarchy with approximation support in order to improve the energy efficiency of the memory subsystem when running approximate programs. We present approximation techniques for major building blocks of our hybrid cache-SPM memory hierarchy. We introduce Relaxed Cache as an approximate private L1 SRAM cache where the quality, capacity, and energy consumption of this cache are controlled through two architectural knobs (i.e., voltage and the number of acceptable faulty bits per cache block). We then present QuARK Cache, an approximate shared L2 STT-MRAM cache. The read and write current amplitude provide two knobs to make a tradeoff between the accuracy of memory operations and the dynamic energy consumption. We then introduce Write-Skip, a technique that skips write operations in STT-MRAM data SPMs if the previous value and the new value are approximately equal. Finally, we discuss a quality-configurable memory approximation strategy using formal control theory that adjusts the level of approximation at runtime depending on the desired quality for the program's output.;We implemented all software and hardware components of the proposed software-assisted memory hierarchy in the gem5 architectural simulator. Our simulations on a mix of RMS and microbenchmarks show that our proposed techniques achieve better performance, energy, and scalability for manycore systems over traditional hardware-managed memory hierarchies.
机译:诸如识别/挖掘/合成(RMS),视觉计算,可穿戴设备和物联网(IoT)之类的新兴应用领域对计算的需求不断增长,推动了向许多核心体系结构的转移,以更好地管理性能,能效和许多核心架构的内存层次结构对其整体性能,能效和可靠性有重大影响。我们发现了三个主要问题,这些问题使传统的内存层次结构对于许多内核体系结构及其数据密集型工作负载没有吸引力:(1)它们耗电高,并且不适合面对深色硅片的许多内核;(2)它们不适合于工作负载的需求和内存行为,以及(3)由于一致性开销而无法扩展。;本文认为,这些低效率中的许多是与软件无关的硬件管理的内存层次结构的结果。可以利用软件中捕获的应用程序语义和行为来更有效地管理内存层次结构。本文利用了这些信息中的一些,并提出了多种技术来缓解上述两种低效率的不足:(1)显式管理混合高速缓存-SPM存储器层次结构,以及(2)利用近似计算提高能效。本文介绍了由分布式内存组成的软件辅助内存层次结构所需的硬件和软件支持,这些内存可以在运行时在缓存和软件可编程内存(SPM)之间进行分区。此内存层次结构支持SPM与缓存之间以及两个物理SPM之间的本地和远程分配以及数据移动。分布式SPM空间在线程混合之间共享,其中每个线程在整个执行过程中显式请求SPM空间。该层次结构的运行时组件基于分配策略在竞争线程之间共享整个分布式SPM空间。与传统的内存层次结构不同,我们在此层次结构中未包含任何一致性逻辑。该程序在分布式SPM空间上显式分配共享数据。对于该程序的所有线程,对共享数据的访问都转发到同一物理副本。接下来,我们通过近似支持来扩展此层次结构中的缓存和SPM,以便在运行近似程序时提高内存子系统的能效。我们为混合高速缓存-SPM内存层次结构的主要构建块提供了近似技术。我们将放松缓存称为近似私有L1 SRAM缓存,其中此缓存的质量,容量和能耗通过两个体系结构旋钮(即电压和每个缓存块可接受的故障位数)来控制。然后,我们介绍QuARK缓存,一种近似的共享L2 STT-MRAM缓存。读写电流幅度提供了两个旋钮,可以在存储操作的精度和动态能耗之间进行权衡。然后,我们引入Write-Skip(写跳过),该技术可在先前值和新值大致相等时跳过STT-MRAM数据SPM中的写操作。最后,我们讨论了一种使用形式化控制理论的质量可配置的内存近似策略,该策略根据程序输出的期望质量在运行时调整近似级别。;我们在以下示例中实现了建议的软件辅助内存层次结构的所有软件和硬件组件gem5体系结构模拟器。我们对RMS和微基准的混合仿真显示,与传统的硬件管理的内存层次结构相比,我们提出的技术为许多内核系统提供了更好的性能,能量和可伸缩性。

著录项

  • 作者单位

    University of California, Irvine.;

  • 授予单位 University of California, Irvine.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2018
  • 页码 174 p.
  • 总页数 174
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号