...
首页> 外文期刊>Microprocessors and microsystems >Simultaneous multithreading trace processors: Improving trace processors performance
【24h】

Simultaneous multithreading trace processors: Improving trace processors performance

机译:同时多线程跟踪处理器:提高跟踪处理器的性能

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Trace Processors is a promising next-generation microarchitecture that exploits implicit thread-level parallelism (TLP) in conventional applications by employing aggressive control and data speculation techniques. Although high performance can be achieved by trace processors, but in fact, processing element (PE) resources are still underutilized due to frequent trace cache misses and next-trace mispeculations. When trace cache miss occurs, trace dispatch engine must stall and supply nothing to idle PE until the completion of trace construction. When next-trace mispeculation occurs, in addition to trace dispatch engine stall, all speculative execution results after the mispeculated trace must be discarded. All the operations on those squashed traces are useless. When trace processors scales up with more PEs, this problem will become more severe. Addressing to this problem, we propose augmenting multiple thread contexts into trace processors. A combined microarchitecture— Simultaneous Multithreading trace processors (SMT trace processors) is proposed in this paper. By dispatching trace from other threads, the penalties of trace cache miss and next-trace mispeculation can be tolerated. Introducing multiple thread contexts reduce the percentage of wrong-path speculations for each thread and improve PE execution efficiency significantly. Simulation results show that integrating two thread contexts can improve 8-PE trace processors performance 27.7%. When augmenting four and eight thread contexts, the corresponding improvements are 28.7 and 15.4%. And we believe that even higher performance improvement can be expected when we integrate more PEs into SMT trace processors.
机译:跟踪处理器是一种有前途的下一代微体系结构,它通过采用主动控制和数据推测技术在常规应用程序中利用隐式线程级并行性(TLP)。尽管跟踪处理器可以实现高性能,但是实际上,由于频繁的跟踪高速缓存未命中和下一个跟踪推测,处理元素(PE)资源仍未得到充分利用。当发生跟踪缓存未命中时,跟踪调度引擎必须停止运行,并且不向空闲的PE提供任何东西,直到完成跟踪构建为止。当发生下一个跟踪推测时,除了跟踪调度引擎停顿外,必须丢弃推测跟踪之后的所有推测执行结果。这些压缩后的痕迹上的所有操作都是无用的。当跟踪处理器随着更多的PE扩展时,此问题将变得更加严重。为了解决这个问题,我们建议将多个线程上下文扩展到跟踪处理器中。本文提出了一种组合微体系结构—同步多线程跟踪处理器(SMT跟踪处理器)。通过从其他线程分派跟踪,可以容忍跟踪缓存未命中和下一个跟踪推测的损失。引入多个线程上下文可以减少每个线程的错误路径推测的百分比,并显着提高PE执行效率。仿真结果表明,集成两个线程上下文可以将8-PE跟踪处理器的性能提高27.7%。当扩充四个和八个线程上下文时,相应的改进为28.7和15.4%。而且我们相信,当我们将更多的PE集成到SMT跟踪处理器中时,可以期待更高的性能改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号