首页> 外文会议>Annual international symposium on Computer architecture;International symposium on Computer architecture >Instruction prefetching of systems codes with layout optimized for reduced cache misses
【24h】

Instruction prefetching of systems codes with layout optimized for reduced cache misses

机译:通过优化布局以减少高速缓存未命中的指令对系统代码进行预取

获取原文

摘要

High-performing on-chip instruction caches are crucial to keep fast processors busy. Unfortunately, while on-chip caches are usually successful at intercepting instruction fetches in loop-intensive engineering codes, they are less able to do so in large systems codes. To improve the performance of the latter codes, the compiler can be used to lay out the code in memory for reduced cache conflicts. Interestingly, such an operation leaves the code in a state that can be exploited by a new type of instruction prefetching: guarded sequential prefetching.The idea is that the compiler leaves hints in the code as to how the code was laid out. Then, at run time, the prefetching hardware detects these hints and uses them to prefetch more effectively. This scheme can be implemented very cheaply: one bit encoded in control transfer instructions and a prefetch module that requires minor extensions to existing next-line sequential prefetchers. Furthermore, the scheme can be turned off and on at run time with the toggling of a bit in the TLB. The scheme is evaluated with simulations using complete traces from a 4-processor machine. Overall, for 16-Kbyte primary instruction caches, guarded sequential prefetching removes, on average, 66% of the instruction misses remaining in an operating system with an optimized layout, speeding up the operating system by 10%. Moreover, the scheme is more cost-effective and robust than existing sequential prefetching techniques.
机译:高性能的片上指令高速缓存对于保持快速处理器繁忙至关重要。不幸的是,虽然片上高速缓存通常可以成功地拦截循环密集型工程代码中的指令提取,但在大型系统代码中却无法做到这一点。为了提高后一种代码的性能,可以使用编译器在内存中布置代码,以减少高速缓存冲突。有趣的是,这样的操作使代码处于一种可以被新型指令预取利用的状态:受保护的顺序预取,其思想是编译器在代码中留下了有关代码布局的提示。然后,在运行时,预取硬件会检测到这些提示,并使用它们来更有效地进行预取。该方案可以非常便宜地实现:在控制传输指令中编码一位,以及一个预取模块,该模块需要对现有的下一行顺序预取器进行较小的扩展。此外,可以在运行时通过在TLB中切换一位来关闭和打开该方案。使用来自4处理器计算机的完整跟踪对方案进行仿真评估。总体而言,对于16 KB的主指令高速缓存,受保护的顺序预取平均可消除具有优化布局的操作系统中剩余的66%的指令未命中,从而将操作系统的运行速度提高了10%。此外,与现有的顺序预取技术相比,该方案更具成本效益且更可靠。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号