Instruction prefetching of systems codes with layout optimized for reduced cache misses

机译：通过优化布局以减少高速缓存未命中的指令对系统代码进行预取

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

High-performing on-chip instruction caches are crucial to keep fast processors busy. Unfortunately, while on-chip caches are usually successful at intercepting instruction fetches in loop-intensive engineering codes, they are less able to do so in large systems codes. To improve the performance of the latter codes, the compiler can be used to lay out the code in memory for reduced cache conflicts. Interestingly, such an operation leaves the code in a state that can be exploited by a new type of instruction prefetching: guarded sequential prefetching.The idea is that the compiler leaves hints in the code as to how the code was laid out. Then, at run time, the prefetching hardware detects these hints and uses them to prefetch more effectively. This scheme can be implemented very cheaply: one bit encoded in control transfer instructions and a prefetch module that requires minor extensions to existing next-line sequential prefetchers. Furthermore, the scheme can be turned off and on at run time with the toggling of a bit in the TLB. The scheme is evaluated with simulations using complete traces from a 4-processor machine. Overall, for 16-Kbyte primary instruction caches, guarded sequential prefetching removes, on average, 66% of the instruction misses remaining in an operating system with an optimized layout, speeding up the operating system by 10%. Moreover, the scheme is more cost-effective and robust than existing sequential prefetching techniques.

机译：高性能的片上指令高速缓存对于保持快速处理器繁忙至关重要。不幸的是，虽然片上高速缓存通常可以成功地拦截循环密集型工程代码中的指令提取，但在大型系统代码中却无法做到这一点。为了提高后一种代码的性能，可以使用编译器在内存中布置代码，以减少高速缓存冲突。有趣的是，这样的操作使代码处于一种可以被新型指令预取利用的状态：受保护的顺序预取，其思想是编译器在代码中留下了有关代码布局的提示。然后，在运行时，预取硬件会检测到这些提示，并使用它们来更有效地进行预取。该方案可以非常便宜地实现：在控制传输指令中编码一位，以及一个预取模块，该模块需要对现有的下一行顺序预取器进行较小的扩展。此外，可以在运行时通过在TLB中切换一位来关闭和打开该方案。使用来自4处理器计算机的完整跟踪对方案进行仿真评估。总体而言，对于16 KB的主指令高速缓存，受保护的顺序预取平均可消除具有优化布局的操作系统中剩余的66％的指令未命中，从而将操作系统的运行速度提高了10％。此外，与现有的顺序预取技术相比，该方案更具成本效益且更可靠。

著录项

来源
《Annual international symposium on Computer architecture;International symposium on Computer architecture 》|1996年|P.271-282|共12页
会议地点
作者
Chun Xia; Josep Torrellas;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类总体结构、系统结构 ;
关键词

相似文献

外文文献
中文文献
专利

1. Balanced Cache: Reducing Conflict Misses of Direct-Mapped Caches through Programmable Decoders [J] . Chuanjun Zhang Computer architecture news . 2006 ,第2期

机译：平衡缓存：通过可编程解码器减少直接映射缓存的冲突丢失
2. Fast indexing for blocked array layouts to reduce cache misses [J] . Evangelia Athanasaki, Nectarios Koziris International Journal of High Performance Computing and Networking . 2005 ,第5a6期

机译：快速索引以阻塞阵列的布局，以减少缓存未命中
3. Refactoring Intermediately Executed Code to Reduce Cache Capacity Misses [J] . Kristof Beyls, Erik H. DHollander Journal of instruction-level parallelism . 2008 ,第2008期

机译：重构中间执行的代码以减少高速缓存容量丢失
4. Code positioning to reduce instruction cache misses in signal processing applications on multimedia RISC processors [C] . Stolberg, H.-J., Ikekawa, . 1997

机译：代码定位可减少多媒体RISC处理器上信号处理应用程序中的指令高速缓存未命中
5. Code layout optimization for remote progressive code update for networked embedded systems. [D] . Kim, Jinsik. 2010

机译：用于网络嵌入式系统的远程渐进式代码更新的代码布局优化。
6. Combining Instruction Prefetching with Partial Cache Locking to Improve WCET in Real-Time Systems [O] . Fan Ni, Xiang Long, Han Wan, -1

机译：将指令预取与部分缓存锁定相结合以改善实时系统中的WCET
7. Instruction Prefetching of Systems Codes With Layout Optimized for Reduced Cache Misses [O] . Chun Xia, Josep Torrellas 1996

机译：具有针对减少高速缓存未命中优化的布局的系统代码的指令预取

Instruction prefetching of systems codes with layout optimized for reduced cache misses

摘要

著录项

相似文献

相关主题

期刊订阅