首页> 外文会议>Conference on Computing frontiers >Kilo-instruction processors, runahead and prefetching
【24h】

Kilo-instruction processors, runahead and prefetching

机译:基洛指令处理器,超前运行和预取

获取原文

摘要

There is a continuous research effort devoted to overcome the memory wall problem. Prefetching is one of the most frequently used techniques. A prefetch mechanism anticipates the processor requests by moving data into the lower levels of the memory hierarchy. Runahead mechanism is another form of prefetching based on speculative execution. This mechanism executes speculative instructions under an L2 miss, preventing the processor from being stalled when the reorder buffer completely fills, and thus allowing the generation of useful prefetches. Another technique to alleviate the memory wall problem provides processors with large instruction windows, avoiding window stalls due to in-order commit and long latency loads. This approach, known as "Kilo-instruction processors", relies on exploiting more instruction level parallelism allowing thousands of in-flight instructions while long latency loads are outstanding in memory.In this work, we present a comparative study of the three above-mentioned approaches, showing their key issues and performance tradeoffs. We show that Runahead execution achieves better performance speedups (30% on average) than traditional prefetch techniques (21% on average). Nevertheless, the Kilo-instruction processor performs best (68% on average). Kilo-instruction processors are not only faster but also generate a lower number of speculative instructions than Runahead. When combining the prefetching mechanism evaluated with Runahead and Kilo-instruction processor, the performance is improved even more in each case (49,5% and 88,9% respectively), although Kilo-instruction with prefetch achieves better performance and executes less speculative instructions than Runahead.
机译:有持续的研究努力致力于克服存储器壁问题。预取是最常用的技术之一。预取机制通过将数据移入内存层次结构的较低级别来预期处理器的请求。提前运行机制是基于推测执行的另一种预取形式。该机制在L2未命中的情况下执行推测性指令,从而防止处理器在重新排序缓冲区完全填满时停止运行,从而允许生成有用的预取。缓解内存壁问题的另一种技术为处理器提供了较大的指令窗口,避免了由于有序提交和长时间等待负载而导致的窗口停顿。这种称为“千指令处理器”的方法依赖于利用更多的指令级并行性,从而允许数千条运行中的指令,而长时延负载却在内存中表现出色。在这项工作中,我们将对上述三个进行比较研究。方法,展示其关键问题和性能折衷。我们显示,与传统的预取技术(平均21%)相比,Runahead执行可实现更好的性能加速(平均30%)。尽管如此,Kilo指令处理器仍表现最佳(平均68%)。与Runahead相比,基洛指令处理器不仅速度更快,而且生成的推测性指令数量也更少。当结合使用Runahead和Kilo指令处理器评估的预取机制时,尽管带有预取的Kilo指令实现了更好的性能并且执行了较少的推测性指令,但是在每种情况下,性能都得到了更大的提高(分别为49.5%和88,9%)。比露娜黑德

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号