首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Reducing false sharing and improving spatial locality in a unified compilation framework
【24h】

Reducing false sharing and improving spatial locality in a unified compilation framework

机译:在统一的编译框架中减少错误共享并改善空间局部性

获取原文
获取原文并翻译 | 示例

摘要

The performance of applications on large shared-memory multiprocessors with coherent caches depends on the interaction between the granularity of data sharing, the size of the coherence unit, and the spatial locality exhibited by the applications, in addition to the amount of parallelism in the applications. Large coherence units are helpful in exploiting spatial locality, but worsen the effects of false sharing. A mathematical framework that allows a clean description of the relationship between spatial locality and false sharing is derived in this paper. First, a technique to identify a severe form of multiple-writer false sharing is presented. The importance of the interaction between optimization techniques aimed at enhancing locality and the techniques oriented toward reducing false sharing is then demonstrated. Given the conflicting requirements, a compiler-based approach to this problem holds promise. This paper investigates the use of data transformations in addressing spatial locality and false sharing, and derives an approach that balances the impact of the two. Experimental results demonstrate that such a balanced approach outperforms those approaches that consider only one of these two issues. On an eight-processor SGI/Cray Origin 2000 multiprocessor, our approach brings an additional 9 percent improvement over a powerful locality optimization technique that uses both loop and data transformations. The presented approach also obtains an additional 19 percent improvement over an optimization technique that is oriented specifically toward reducing false sharing. This study also reveals that, in addition to reducing synchronization costs and improving the memory subsystem performance, obtaining large granularity parallelism is helpful in balancing the effects of enhancing locality and reducing false sharing, rendering them compatible.
机译:具有相干缓存的大型共享内存多处理器上的应用程序性能取决于数据共享的粒度,相干单元的大小以及应用程序显示的空间局部性之间的相互作用,以及应用程序中的并行度。 。大型相干单元有助于开发空间局部性,但会加剧错误共享的影响。本文推导了一个数学框架,该框架允许对空间局部性和虚假共享之间的关系进行清晰的描述。首先,提出了一种识别严重形式的多作者错误共享的技术。然后说明了旨在提高局部性的优化技术与旨在减少虚假共享的技术之间进行交互的重要性。考虑到相互矛盾的需求,基于编译器的方法可以解决这个问题。本文研究了数据转换在解决空间局部性和虚假共享方面的用途,并得出了一种平衡两者影响的方法。实验结果表明,这种平衡的方法优于仅考虑这两个问题之一的方法。在八处理器SGI / Cray Origin 2000多处理器上,我们的方法比使用循环和数据转换的强大的局部优化技术提高了9%。与专门针对减少错误共享的优化技术相比,本文提出的方法还获得了19%的额外改进。这项研究还表明,除了降低同步成本和提高内存子系统性能外,获得大粒度并行度还有助于平衡增强局部性和减少虚假共享(使它们兼容)的效果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号