首页> 外文会议>International Conference on Parallel Architectures and Compilation Techniques >XStream: Cross-core spatial streaming based MLC prefetchers for parallel applications in CMPs
【24h】

XStream: Cross-core spatial streaming based MLC prefetchers for parallel applications in CMPs

机译:XSTREAM:基于跨核空间流的MLC预取器,用于CMPS中的并行应用

获取原文

摘要

Hardware prefetchers are commonly used to hide and tolerate off-chip memory latency. Prefetching techniques in the literature are designed for multiple independent sequential applications running on a multicore system. In contrast to multiple independent applications, a single parallel application running on a multicore system exhibits different behavior. In case of a parallel application, cores share and communicate data and code among themselves, and there is commonality in the demand miss streams across multiple cores. This gives an opportunity to predict the demand miss streams and communicate the predicted streams from one core to another, which we refer as cross-core stream communication. We propose cross-core spatial streaming (XStream), a practical and storage-efficient cross-core prefetching technique. XStream detects and predicts the cross-core spatial streams at the private mid level caches (MLCs) and sends the predicted streams in advance to MLC prefetchers of the predicted cores. We compare the effectiveness of XStream with the ideal cross-core spatial streamer. Experimental results demonstrate that, on an average (geomean), compared to the state-of-the-art spatial memory streaming, storage efficient XStream reduces the execution time by 11.3% (as high as 24%) and 9% (as high as 29.09%) for 4-core and 8-core systems respectively.
机译:硬件预取器通常用于隐藏和容忍外部内存延迟。文献中的预取技术是设计用于在多核系统上运行的多个独立顺序应用程序。与多个独立应用相比,在多核系统上运行的单个并行应用程序表现出不同的行为。在并行应用程序的情况下,核心之间的共享和传达数据和代码之间,并且在多个核心的需求下的需求小姐中存在共平。这给出了预测需求小姐流的机会,并将预测的流从一个核心传送到另一个核心,我们将作为跨核心流通信引用。我们提出了跨核空间流(XSTREAD),一种实用和储存的跨核预取技术。 XSTREAM检测和预测私有中级缓存(MLC)的跨核空间流,并预先将预测的流发送给预测核的MLC预取器。我们将XSTREAD与理想的跨核空间拖缆的有效性进行比较。实验结果表明,与最先进的空间记忆流相比,平均(地理),存储有效XSTREAM将执行时间降低11.3%(高达24%)和9%(高于分别为4核和8核系统的29.09%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号