【24h】

Dynamic Cache Contention Detection in Multi-threaded Applications

机译:多线程应用程序中的动态缓存争用检测

获取原文
获取原文并翻译 | 示例

摘要

In today's multi-core systems, cache contention due to true and false sharing can cause unexpected and significant performance degradation. A detailed understanding of a given multi-threaded application's behavior is required to precisely identify such performance bottlenecks. Traditionally, however, such diagnostic information can only be obtained after lengthy simulation of the memory hierarchy. In this paper, we present a novel approach that efficiently analyzes interactions between threads to determine' thread correlation and detect true and false sharing. It is based on the following key insight: although the slowdown caused by cache contention depends on factors including the thread-to-core binding and parameters of the memory hierarchy, the amount of data sharing is primarily a function of the cache line size and application behavior. Using memory shadowing and dynamic instrumentation, we implemented a tool that obtains detailed sharing information between threads without simulating the full complexity of the memory hierarchy. The runtime overhead of our approach — a 5x slowdown on average relative to native execution — is significantly less than that of detailed cache simulation. The information collected allows programmers to identify the degree of cache contention in an application, the correlation among its threads, and the sources of significant false sharing. Using our approach, we were able to improve the performance of some applications by up to a factor of 12 x. For other contention-intensive applications, we were able to shed light on the obstacles that prevent their performance from scaling to many cores.
机译:在当今的多核系统中,由于正确和错误共享而导致的缓存争用会导致意外的显着性能下降。需要对给定的多线程应用程序的行为有详细的了解,才能准确地识别此类性能瓶颈。但是,传统上,此类诊断信息只能在对存储器层次结构进行长时间的模拟之后才能获得。在本文中,我们提出了一种新颖的方法,可以有效地分析线程之间的交互以确定“线程相关性”并检测真假共享。它基于以下关键见解:尽管由缓存争用导致的速度下降取决于包括线程到核心绑定和内存层次结构参数在内的因素,但数据共享的数量主要取决于缓存行大小和应用程序行为。通过使用内存屏蔽和动态工具,我们实现了一个工具,该工具可获取线程之间的详细共享信息,而无需模拟内存层次结构的全部复杂性。我们的方法的运行时开销(相对于本机执行,平均速度降低了5倍)明显少于详细的缓存模拟。收集的信息使程序员可以识别应用程序中的缓存争用程度,其线程之间的相关性以及严重的错误共享的来源。使用我们的方法,我们能够将某些应用程序的性能提高多达12倍。对于其他竞争激烈的应用程序,我们能够阐明阻碍其性能扩展到许多核心的障碍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号