首页> 外文期刊>IEEE Transactions on Computers >CPU cache prefetching: Timing evaluation of hardware implementations
【24h】

CPU cache prefetching: Timing evaluation of hardware implementations

机译:CPU缓存预取:硬件实现的时序评估

获取原文
获取原文并翻译 | 示例

摘要

Prefetching into CPU caches has long been known to be effective in reducing the cache miss ratio, but known implementations of prefetching have been unsuccessful in improving CPU performance. The reasons for this are that prefetches interfere with normal cache operations by making cache address and data ports busy, the memory bus busy, the memory banks busy, and by not necessarily being complete by the time that the prefetched data is actually referenced. In this paper, we present extensive quantitative results of a detailed cycle-by-cycle trace-driven simulation of a uniprocessor memory system in which we vary most of the relevant parameters in order to determine when and if hardware prefetching is useful. We find that, in order for prefetching to actually improve performance, the address array needs to be double ported and the data array needs to either be double ported or fully buffered. It is also very helpful for the bus to be very wide (e.g., 16 bytes) for bus transactions to be split and for main memory to be interleaved. Under the best circumstances, i.e., with a significant investment in extra hardware, prefetching can significantly improve performance. For implementations without adequate hardware, prefetching often decreases performance.
机译:早就知道预取到CPU缓存中可以有效降低缓存未命中率,但是已知的预取实现并不能成功地提高CPU性能。这样做的原因是,预取通过使高速缓存地址和数据端口繁忙,存储器总线繁忙,存储器组繁忙以及在实际引用预取数据之前未必完成而干扰正常的高速缓存操作。在本文中,我们给出了单处理器存储系统逐周期跟踪驱动模拟的详细定量结果,在该模拟中,我们改变了大多数相关参数,以确定何时以及是否有用硬件预取。我们发现,为了使预取能够真正提高性能,地址阵列需要进行双端口移植,而数据阵列需要进行双端口移植或完全缓冲。总线非常宽(例如16字节)对于总线事务被拆分以及对主存储器的交织也是非常有帮助的。在最佳情况下,即,在额外的硬件上进行了大量投资时,预取可以显着提高性能。对于没有足够硬件的实现,预取通常会降低性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号