首页> 外文会议>High Performance Computer Architecture, 2009. HPCA 2009 >Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems
【24h】

Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems

机译:混合预取系统中链接数据结构的带宽高效预取技术

获取原文

摘要

Linked data structure (LDS) accesses are critical to the performance of many large scale applications. Techniques have been proposed to prefetch such accesses. Unfortunately, many LDS prefetching techniques 1) generate a large number of useless prefetches, thereby degrading performance and bandwidth efficiency, 2) require significant hardware or storage cost, or 3) when employed together with stream-based prefetchers, cause significant resource contention in the memory system. As a result, existing processors do not employ LDS prefetchers even though they commonly employ stream-based prefetchers. This paper proposes a low-cost hardware/software cooperative technique that enables bandwidth-efficient prefetching of linked data structures. Our solution has two new components: 1) a compiler-guided prefetch filtering mechanism that informs the hardware about which pointer addresses to prefetch, 2) a coordinated prefetcher throttling mechanism that uses run-time feedback to manage the interference between multiple prefetchers (LDS and stream-based) in a hybrid prefetching system. Evaluations show that the proposed solution improves average performance by 22.5% while decreasing memory bandwidth consumption by 25% over a baseline system that employs an effective stream prefetcher on a set of memory- and pointer-intensive applications. We compare our proposal to three different LDS/correlation prefetching techniques and find that it provides significantly better performance on both single-core and multi-core systems, while requiring less hardware cost.
机译:链接数据结构(LDS)访问对许多大型应用程序的性能至关重要。已经提出了预取这种访问的技术。不幸的是,许多LDS预取技术1)生成了大量无用的预取,从而降低了性能和带宽效率,2)需要大量的硬件或存储成本,或者3)与基于流的预取器一起使用时,会导致大量的资源争用。内存系统。结果,即使现有处理器通常采用基于流的预取器,它们也不使用LDS预取器。本文提出了一种低成本的硬件/软件协作技术,该技术可实现带宽高效的链接数据结构的预取。我们的解决方案包含两个新组件:1)编译器引导的预取过滤机制,该机制通知硬件有关要预取的指针地址的信息; 2)协调的预取器限制机制,该机制使用运行时反馈来管理多个预取器(LDS和混合预取系统中。评估显示,与在一组内存和指针密集型应用程序上使用有效流预取器的基准系统相比,所提出的解决方案将平均性能提高了22.5%,同时将内存带宽消耗降低了25%。我们将提案与三种不同的LDS /相关性预取技术进行了比较,发现它在单核和多核系统上均提供了明显更好的性能,同时所需的硬件成本更低。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号