首页> 外文期刊>Journal of Parallel and Distributed Computing >Improving effective bandwidth through compiler enhancement of global cache reuse
【24h】

Improving effective bandwidth through compiler enhancement of global cache reuse

机译:通过编译器增强全局缓存重用来提高有效带宽

获取原文
获取原文并翻译 | 示例

摘要

The performance of modern machines is increasingly limited by insufficient memory bandwidth. One way to alleviate this bandwidth limitation for a given program is to minimize the aggregate data volume the program transfers from memory. In this article we present compiler strategies for accomplishing this minimization. Following a discussion of the underlying causes off bandwidth limitations, we present a two-step strategy to exploit global cache reuse—the temporal reuse across the whole progarm and the spatial reuse across the entire data set used in that program. In the first step, we fuse computation on the same data using a technique called reuse-based loop fusion to integrate loops with different control structures. We prove that optimal fusion for bandwidth is NP-hard and we explore the limitations of computation fusion using perfect program information. In the second step, we group data used by the same computation through the technique of affinity-based data regrouping, which intermixes the storage assignments of program data elements at different granularities. We show that the method is compile-time optimal and can be used on array and structure data. We prove that two extensions—partial and dynamic data regrouping—are NP-hard problems. Finally, we describe our compiler implementation and experiments demonstrating that the new global strategy, on average, reduces memory traffic by over 40% and improves execution speed by over 60% on two high-end workstations.
机译:由于内存带宽不足,现代机器的性能越来越受到限制。减轻给定程序的带宽限制的一种方法是最小化程序从内存传输的总数据量。在本文中,我们介绍了实现此最小化的编译器策略。在讨论了造成带宽限制的根本原因之后,我们提出了利用全局缓存重用的两步策略-整个程序的时间重用和该程序中使用的整个数据集的空间重用。第一步,我们使用一种称为“基于重用的循环融合”的技术将相同数据上的计算融合在一起,以集成具有不同控制结构的循环。我们证明带宽的最佳融合是NP难的,并且我们探索使用完美程序信息进行计算融合的局限性。在第二步中,我们通过基于亲和力的数据重新分组技术将同一计算所使用的数据分组,该技术将不同粒度的程序数据元素的存储分配混合在一起。我们表明该方法是编译时最佳的,可用于数组和结构数据。我们证明了两个扩展-部分和动态数据重组-是NP难题。最后,我们描述了我们的编译器实现和实验,证明了新的全局策略平均可在两个高端工作站上将内存流量减少40%以上,并将执行速度提高60%以上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号