首页> 外文OA文献 >Summarizing multiprocessor program execution with versatile, microarchitecture-independent snapshots
【2h】

Summarizing multiprocessor program execution with versatile, microarchitecture-independent snapshots

机译:通过多功能,独立于微架构的快照总结多处理器程序执行

摘要

Computer architects rely heavily on software simulation to evaluate, refine, and validate new designs before they are implemented. However, simulation time continues to increase as computers become more complex and multicore designs become more common. This thesis investigates software structures and algorithms for quickly simulating modern cache-coherent multiprocessors by amortizing the time spent to simulate the memory system and branch predictors. The Memory Timestamp Record (MTR) summarizes the directory and cache state of a multiprocessor system in a compact data structure. A single MTR snapshot is versatile enough to reconstruct the microarchitectural state resulting from various coherence protocols and cache organizations. The MTR may be quickly updated by each simulated processor during a fast-forwarding phase and optionally stored off-line for reuse. To fill large branch prediction tables, we introduce Branch Predictor-based Compression (BPC) which compactly stores a branch trace so that it may be used to fill in any branch predictor structure. An entire BPC trace requires less space than single discrete predictor snapshots, and it may be decompressed 3-6x faster than performing functional simulation.
机译:在实施新设计之前,计算机架构师在很大程度上依赖软件仿真来评估,改进和验证新设计。但是,随着计算机变得越来越复杂和多核设计变得越来越普遍,仿真时间继续增加。本文研究了通过分摊模拟存储系统和分支预测器所花费的时间来快速模拟现代高速缓存一致性多处理器的软件结构和算法。内存时间戳记录(MTR)以紧凑的数据结构总结了多处理器系统的目录和缓存状态。单个MTR快照具有足够的通用性,可以重建由各种一致性协议和缓存组织产生的微体系结构状态。 MTR可以在快速转发阶段由每个模拟处理器快速更新,并可以脱机存储以供重用。为了填充大的分支预测表,我们引入了基于分支预测器的压缩(BPC),该压缩紧凑地存储了分支跟踪,因此可用于填充任何分支预测器结构。完整的BPC跟踪比单个离散的预测变量快照所需的空间更少,并且其解压缩的速度可能比执行功能仿真的速度快3-6倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号