首页> 外文学位 >Pre-execution via speculative data-driven multithreading.
【24h】

Pre-execution via speculative data-driven multithreading.

机译:通过推测性数据驱动的多线程进行预执行。

获取原文
获取原文并翻译 | 示例

摘要

This dissertation introduces pre-execution, a novel technique for accelerating sequential programs. Pre-execution directly attacks the instructions that cause performance problems—mis-predicted branches and cache missing loads. In pre-execution, future branch outcomes and load addresses are computed on the side and the results are fed to the main program. In doing so, the main program is spared from having to incur the full computation latencies of these instructions. Pre-execution exploits out-of-order fetch and decoupling. Fetching and executing only critical load and branch computations while skipping over all unrelated instructions allows pre-execution to compute values faster than the main program. Decoupling, doing so in a separate thread, isolates stalls that occur in these computations so that they do not directly impact the main program thread.; This dissertation describes speculative data-driven multithreading (DDMT), an implementation of pre-execution. DDMT implements the runtime component of pre-execution—responsible for pre-executing computations and communicating the results to the main program—as an extension to a superscalar processor. In addition to using the single cache hierarchy to allow pre-executing computations to prefetch for the main program, DDMT stores individual pre-executed instruction results in the shared physical register and then passes them one-by-one to the main program via a novel modification to register renaming called register integration.; For DDMT's setup component—responsible for finding load and branch computations and conveying them to the runtime component—this dissertation introduces an algorithm for automatically extracting performance-enhancing computations from program traces. The algorithm evaluates a benefit-cost function over all candidate computations in a trace and chooses those that maximize benefit (latency tolerance) while minimizing cost (execution overhead). The algorithm is formulated to permit software, hardware, and hybrid implementations.; The dissertation includes a simulation-driven performance evaluation of DDMT Our results show that DDMT achieves 10% to 15% performance improvements for general-purpose integer programs running on an aggressive baseline processor with large caches, with the potential for greater improvements on likely future processor designs. We conclude that pre-execution and DDMT are promising technologies that merit consideration for inclusion in future machines.
机译:本文介绍了 pre-execution ,它是一种加速顺序程序的新技术。预执行直接攻击导致性能问题的指令-预测错误的分支并缓存丢失的负载。在预执行中,在一侧计算将来的分支结果和加载地址,并将结果馈送到主程序。这样,主程序就不必承担这些指令的全部计算延迟。执行前利用乱序获取解耦。在跳过所有无关指令的同时,仅获取并执行关键的负载和分支计算,可使预执行程序比主程序更快地计算值。去耦,在一个单独的线程中进行,隔离在这些计算中发生的停顿,这样它们就不会直接影响主程序线程。本文描述了执行执行的投机性数据驱动多线程 DDMT )。 DDMT作为超标量处理器的扩展,实现了预执行的运行时组件(负责预执行计算并将结果传达给主程序)。 DDMT除了使用单一的缓存层次结构允许预执行的计算为主程序预取外,DDMT还将各个预执行的指令结果存储在共享的物理寄存器中,然后通过一种新颖的方法将它们一一传递给主程序。对注册重命名的修改称为“ 注册集成”。对于DDMT的设置组件(负责查找负载和分支计算并将其传送到运行时组件),本论文介绍了一种算法,该算法可从程序跟踪中自动提取性能增强的计算。该算法对跟踪中所有候选计算的收益成本函数进行评估,并选择收益最大化(延迟容限)而成本最小化(执行开销)的函数。该算法被制定为允许软件,硬件和混合实现。论文包括对DDMT的仿真驱动性能评估。我们的结果表明,对于在具有大型缓存的激进基准处理器上运行的通用整数程序,DDMT可以将性能提高10%到15%,并且有可能在未来的处理器上有更大的改进设计。我们得出结论,预执行和DDMT是有前途的技术,值得考虑将其包含在将来的计算机中。

著录项

  • 作者

    Roth, Amir.;

  • 作者单位

    The University of Wisconsin - Madison.;

  • 授予单位 The University of Wisconsin - Madison.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2001
  • 页码 356 p.
  • 总页数 356
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号