首页> 外文期刊>Microelectronics journal >BRLoop: Constructing balanced retimed loop to architect STT-RAM-based hybrid cache for VLIW processors
【24h】

BRLoop: Constructing balanced retimed loop to architect STT-RAM-based hybrid cache for VLIW processors

机译:BRLoop:构建平衡的重定时循环以为VLIW处理器构建基于STT-RAM的混合缓存

获取原文
获取原文并翻译 | 示例
           

摘要

The new emerging non-volatile memory technology of Spin Torque Transfer RAM (STT-RAM) has been proposed as a replacement for SRAM based cache. Recently its commercial step has been greatly boosted by big companies such as Samsung. Although STT-RAM has quite a few advantages such as nonvolatility, high density and extremely low leakage power consumption, it suffers high dynamic energy and long latency on write operations. Addressing this problem, researchers proposed a STT-RAM/SRAM hybrid structure to alleviate the side effect of write operations. In hybrid caches, a migration based technique is often adopted to explore the advantages of both parts of a hybrid cache by dynamically moving write-intensive and read-intensive data between STT-RAM and SRAM.Meanwhile, migrations also introduce extra reads and writes during data movements. For stencil loops with read and write data dependencies, it is observed that migration overhead is significant and migrations closely correlate to the interleaved read and write memory access pattern in a memory block. Loop retiming technique has proposed to reduce the migration overhead by changing the interleaved memory access pattern. It is known that loop retiming has been extensively studied to maximize instruction-level parallelism (ILP) of multiple function units by rearranging the dependence delays in a uniform loop. Both retiming techniques are conducted by changing the instruction dependence delays in a loop. However, this previous ILP-aware loop retiming is unaware of its impact on the hybrid cache's migration while the recent migration-aware loop retiming has not fully considered the parallelism of arithmetic and logical units (ALUs) in VLIW processors.It is sure that the impacts of retiming on both the migration overhead of hybrid cache and ILP of VLIW should be considered when architecting STT-RAMbased hybrid cache for VLIW processors. Addressing this issue, this paper models the impacts of loop retiming on both ILP of ALUs and migration overhead in STT-RAM/SRAM hybrid cache. An overall balanced loop retiming solution, considering both of the ALU part and the memory part, is devised to achieve high performance for VLIW processors. The experimental results across a set of benchmarks show that the proposed optimal and heuristic balanced retiming approaches can effectively improve the overall system performance over the cases with no retiming, pure migration-aware retiming and pure ILP-aware retiming, respectively.
机译:已经提出了自旋扭矩传输RAM(STT-RAM)的新兴非易失性存储技术,以替代基于SRAM的高速缓存。最近,三星等大公司大大推动了其商业步伐。尽管STT-RAM具有相当多的优势,例如非易失性,高密度和极低的泄漏功耗,但它承受着高动态能量,并且写入操作的等待时间较长。针对这一问题,研究人员提出了一种STT-RAM / SRAM混合结构,以减轻写操作的副作用。在混合高速缓存中,通常采用基于迁移的技术来通过在STT-RAM和SRAM之间动态移动写密集型和读取密集型数据来探索混合高速缓存的两个部分的优点。同时,迁移还引入了额外的读写数据移动。对于具有读写数据相关性的模具循环,可以观察到迁移开销很大,并且迁移与内存块中交错的读写内存访问模式密切相关。循环重定时技术已提出通过改变交错的存储器访问模式来减少迁移开销。众所周知,已经对循环重定时进行了广泛的研究,以通过在一致的循环中重新排列依赖延迟来最大化多个功能单元的指令级并行度(ILP)。两种重定时技术都是通过在循环中更改指令相关延迟来进行的。但是,先前的ILP感知循环重定时并未意识到其对混合缓存的迁移的影响,而最近的迁移感知循环重定时尚未充分考虑VLIW处理器中算术和逻辑单元(ALU)的并行性。在为VLIW处理器设计基于STT-RAM的混合高速缓存时,应考虑重新定时对混合高速缓存的迁移开销和VLIW的ILP的影响。为了解决这个问题,本文模拟了循环重定时对ALU的ILP和STT-RAM / SRAM混合缓存中迁移开销的影响。设计了一种兼顾ALU部分和存储器部分的整体平衡环路重定时解决方案,以实现VLIW处理器的高性能。一系列基准测试的实验结果表明,在没有重定时,纯迁移感知重定时和纯ILP感知重定时的情况下,所提出的最优和启发式平衡重定时方法可以有效地改善整体系统性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号